For benchmarking C implementations, the there are a few benchmarks, but they all have their problems. Many benchmarks have memory requirements that are far too high or need functionality not necessarily available.
Some are quite one-sided in what they measure: Whetstone, Dhrystone, Coremark all pose as generic benchmarks, but in reality, at least for 8-bit systems, scores are dominated by one single aspect (floating-point performance for Whetstone, string processing for Dhrystone, integer multiplication for Coremark).
So, I deciced to write a new benchmark, stdcbench. I wanted it to be suitable for small systems (4KB of RAM, about 32 KB of Flash). There is a trade-off here, since all the data and code will fit easily into caches on bigger systems, but IMO it is worth it.
The current version consists of 2 modules, which on typical systems
should contribute about equally to the score.
It benchmarks a commonly-implemented subset of what the standard
requires for freestanding implementations of C90. It consists of three
1) Huffman/RLE decompression (adapted from real-world code)
2) Integer matrix multiplication (synthetic)
3) Insertion sort (adapted from real-world code)
Benchmarks the standard library.
I consists of two submodules:
1) Computation of lnlc-width (adapted from real-world code).
2) Peephole optimizer (simplified from real-world code).
C99 features (e.g. bool, restrict) are used where available, but not
Modules can be disabled, as necessary due to implementation limitations.
For the future, module(s) for floating-point are planned. But the current benchmark already is quite usable.
So far, stdcbench seems to achieve the goals: benchmark a wide range of important standard C functionality, without giving too much emphasis to any particular aspect.
Here are a few scores from a STM8AF5288 at 16 Mhz:
SDCC 3.7.0 RC1 with optimization for code size (-mstm8 --opt-code-size --max-allocs-per-node 100000), binary size 20953 B:
stdcbench c90base score: 106
stdcbench c90lib score: 87
stdcbench final score: 193
SDCC 3.7.0 RC1 with optimization for code speed (-mstm8--opt-code-speed --max-allocs-per-node 100000), binary size 21083 B:
stdcbench c90base score: 109
stdcbench c90lib score: 88
stdcbench final score: 197
IAR 126.96.36.199 with optimization for code size, binary size 24288 B:
stdcbench c90base score: 117
stdcbench c90lib score: 71
stdcbench final score: 188
IAR 188.8.131.52 with optimization for code speed, binary size 27268 B:
stdcbench c90base score: 197
stdcbench c90lib score: 100
stdcbench final score: 297
Cosmic 4.4.4 with optimization for code size, binary size 8564 B:
stdcbench c90base score: 116
stdcbench final score: 116
Cosmic 4.4.4 with optimization for code speed, binary size 8598 B:
stdcbench c90base score: 123
stdcbench final score: 123
For comparison, here is the output for a high-end 8051-derivative, the C8051F120 at 98 Mhz (compiled with SDCC 3.7.0 RC1 using -mmcs51 --model-large --stack-auto --opt-code-speed --max-allocs-per-node 10000), binary size 13282 B:
stdcbench c90base score: 96
stdcbench final score: 96
For Cosmic and the C8051F120, the c90lib module was disabled due to implementation limitations.¹ That of course results in lower scores and makes the code size uncomparable to IAR/SDCC.
These results are quite interesting when compared to Dhrystone and Coremark. In particular, while SDCC is ahead in Dhrystone and Coremark scores, it apparently falls behind in stdcbench scores. On the other hand, SDCC seems to do better in code size for stdcbench.
¹ For Cosmic, the problem is the lack of qsort() in the standard library. An issue that could be fixed easily, and I hope it will happen. For the C8051F120, the c90lib module runs out of stack space (the MCS-51 architecture has an 8-bit stack pointer).