Skip to main content
Armandas
Associate II
August 31, 2020
Solved

Bad compile-time and run-time performance when using STM32CubeIDE compiler

  • August 31, 2020
  • 11 replies
  • 8702 views

I am preparing for migration from SWSTM32 V2.4 (GCC 6.3.1) to STM32CubeIDE V1.4 (GCC 7.3.1).

During testing, I have noticed very significant performance issues in both compile time and generated executable run time. Please see the attached table for my test results.

All tests were done using CMake and changing the compiler path, so there is no effect of different IDEs.

NOTE: I was not able to test the generic ARM toolchain bundled with STM32CubeIDE, due to it not being able to find a header <bits/c++config.h>. I, therefore, downloaded GCC 7.3.1 form ARM for this comparison.

Run times are in seconds.

0693W000002lj0cQAA.png

This topic has been closed for replies.
Best answer by Armandas

I think I got down to the bottom of this.

I was able to get the build time down to ~42 seconds, by making all the include paths absolute and removing almost all -I include path parameters from GCC build commands. I guess the long path support overhead is increasing with each include parameter.

By the way, the ARM GCC build time went down to about 20 seconds.

11 replies

TDK
Super User
August 31, 2020

I understand "build time", but what is "executable run time"? Are you timing how long a program on an STM32 chip takes to run?

Make sure you have parallel build enabled, or at least equivalent between the options.

"If you feel a post has answered your question, please click ""Accept as Solution""."
Armandas
ArmandasAuthor
Associate II
August 31, 2020

Yes, the run time is the time is takes for my test program to run on the MCU. The timing has been taken using an oscilloscope.

Parallel build was enabled for all tests.

TDK
Super User
August 31, 2020

Not sure. I switch between whatever STM32CubeIDE uses by default and "GNU Tools Arm Embedded\9 2019-q4-major" and haven't noticed a difference.

"If you feel a post has answered your question, please click ""Accept as Solution""."
Armandas
ArmandasAuthor
Associate II
August 31, 2020

I re-took the GCC 7.3.1 tests with the parallel build disabled and got the following results:

ARM GCC 7.3.1: 1:48.07

ST GCC 7.3.1: 8:28.49

The difference is staggering...

Ozone
Principal
August 31, 2020

But it's free ... :-J

You get what you pay for - if you ever heard that proverb.

Armandas
ArmandasAuthor
Associate II
August 31, 2020

Thanks for the great answer...

I'm not here to complain - I can fix the issue by using an external toolchain and forget about it.

By reporting a potential issue in the ST toolchain, I'm trying to help the community. I doubt ST is interested in providing inferior tools...

mattias norlander
ST Employee
August 31, 2020

Hi @Armandas​,

@Ozone​, is right in that there are tools that in many cases provide better code size and execution speed performance than GCC in general.

What is puzzling is the extremely poor build time performance your benchmark shows. We already know since before that GNU Tools for STM32 is about ~30 slower the GCC for ARM on Windows. The reason for the anticipated ~30% slow-down is that our toolchain supports long paths (one of the patches not part of GCC for ARM). This has been a huge source of problems in the past since many example projects have "deep folder hierarchies" and customers are allowed to place the Cube repository anywhere... But your trials are showing ~300%. That is not expected.

I assume this is on Windows only? If you ran on MacOS/Linux the compile time should be ~identical between GNU Tools for STM32 and GCC for ARM.

We have made a fix for make/busybox to try to resolve the parts of the "30% slow-down", but there are still remaining fixes to be added in other places of the toolchain code.

Could you try importing instead some large example project from the CubeFW packages and build that and compare build times ARM vs ST? Do you always get this result? Maybe we could compare the same example projects on two different environments to see if we see the same result?!

Your code execution speed result also looks on the poor side. Impossible to say why without looking at your code...

Thanks for providing your feedback!

Armandas
ArmandasAuthor
Associate II
September 1, 2020

@mattias norlander​ 

Thank you for the answer. I have tried a few things as suggested.

Building the above project on a Linux machine, I got identical results with ST GCC 7.3.1 and ARM GCC 7.3.1.

Back on Windows, I tried to build some different projects:

  • Another (smaller) project:
    • ST GCC: 31 s
    • ARM GCC: 12 s
  • TouchGFX Demonstration from CubeFW:
    • ST GCC: 65 s
    • ARM GCC: 42 s

The TouchGFX example does not appear to be impacted as much as our internal projects.

Armandas
ArmandasAuthorBest answer
Associate II
September 1, 2020

I think I got down to the bottom of this.

I was able to get the build time down to ~42 seconds, by making all the include paths absolute and removing almost all -I include path parameters from GCC build commands. I guess the long path support overhead is increasing with each include parameter.

By the way, the ARM GCC build time went down to about 20 seconds.

MM..1
Chief III
September 1, 2020

Hello, you dont write on how disk your gcc and source is stored, how file sytem, have you antivirus enabled ...

My TouchGFX project around 700kB linked took 12s on SSD NTFS , but on network share 90s

mattias norlander
ST Employee
September 1, 2020

Does your project (the original one this topic was started on) contain many include paths?

The benchmark on the TouchGFX demo sounds maybe still a bit higher than what we have observed, but more reasonable than the ~300% increase...

Which CubeFW package did you use and which board example? Please point me to it like this:

C:\Users\norlandm\STM32Cube\Repository\STM32Cube_FW_F7_V1.16.0\Projects\STM32F7508-DISCO\Demonstrations\TouchGFX

Then I can run comparison between ST and ARM GCC on my computer on Windows.

Armandas
ArmandasAuthor
Associate II
September 1, 2020

Yes, as I posted above, removing the include paths basically resolved the issue...

Here is the demonstration I used:

C:\Users\owner\STM32Cube\Repository\STM32Cube_FW_F4_V1.24.2\Projects\STM32469I_EVAL\Demonstrations\TouchGFX

Ozone
Principal
September 1, 2020

Interesting to hear.

Having Windows environments at my company and Linux at home, I never saw such a huge difference in compile times, just from path resolution.

Perhaps some Eclipse issue ?

Or does Cube use a Unix emulation layer like Cygwin ?

mattias norlander
ST Employee
September 2, 2020

Hi,

CubeIDE does not use any Unix emulation.

But when using relative paths they have first to be converted into absolute paths before it can be expressed as an UNC path. So relative paths should introduce some delay. GCC Arm does not require UNC paths since it does not support long paths. So the UNC conversion step in theory should introduce some delay on the GNU Tools for STM32 vs GCC Arm.

Nevertheless the original results presented above is way worse then our benchmarks.

Regarding "execution time on target" GNU Tools for STM32 vs GCC Arm, we do expect a little bit slower execution speed in the ST toolchain because of:

  • GNU Tools for STM32 newlib standard library is optimized for size -Os, regardless of the project optimization level
  • GNU Tools for STM32 newlib nano library is optimized for size -Os, ...
  • GCC Arm newlib standard library is optimized for size -O2, ...
  • GCC Arm newlib nano library is optimized for size -Os, ...
  • Additionally GNU Tools for STM32 sets more defines for both newlib variants to improve code and RAM size

And as we know smaller code (fewer instructions) tend to effect execution speed in a negative way.

If this makes up for the full difference? ... Hard to tell...

Ozone
Principal
September 2, 2020

It is also possible (and not really difficult) to mess up a Windows machine's registy, especially over time.

If I'm not mistaken, environment variables (often used for paths) are stored there.

Not to mention the common Windows ailments like fragmented file system and RAM.

Pavel A.
Super User
September 5, 2020

@Ozone

Eclipse CDT seems to be quite robust and smart, it's Windows support is first class.

But be sure to have a good SSD and enough RAM.

-- pa