Odd Benchmark Results Using Different gcc Optimizations
I've written a benchmark test to compare the CPU execution time for float adds versus int32_t adds. The results were surprising. The benchmark is very simple, a loop of 100 million is created and timed so that it's value can be subtracted from other tests. The same loop with a float add and another loop with int32_t add is created. The resulting times were then output:
Optimization -O0 22.5 nanosecond for both float and int32_t
Optimization -O1 5 nanosecond for float and 2.5 nanosecond for int32_t
Optimization -O2 10 nanoseconds for both float and int32_t
Optimization -O3 10 nanoseconds for both float and int32_t
I find this odd because I did the same thing for STM32F746 and in that case the fastest times were for -O3 and in all cases, the float add and int32_t add had identical times.
Does anyone have an explanation for this result?
