cancel
Showing results for 
Search instead for 
Did you mean: 

www.colecovision.eu

Philipp Krause
Senior II
Posted on February 11, 2018 at 20:28

stdcbench, a new benchmark for embedded systems and results for the STM8

For benchmarking C implementations, the there are a few benchmarks, but they all have their problems. Many benchmarks have memory requirements that are far too high or need functionality not necessarily available.

Some are quite one-sided in what they measure: Whetstone, Dhrystone, Coremark all pose as generic benchmarks, but in reality, at least for 8-bit systems, scores are dominated by one single aspect (floating-point performance for Whetstone, string processing for Dhrystone, integer multiplication for Coremark).

So, I deciced to write a new benchmark, stdcbench. I wanted it to be suitable for small systems (4KB of RAM, about 32 KB of Flash). There is a trade-off here, since all the data and code will fit easily into caches on bigger systems, but IMO it is worth it.

The current version consists of 2 modules, which on typical systems

should contribute about equally to the score.

c90base:

It benchmarks a commonly-implemented subset of what the standard

requires for freestanding implementations of C90. It consists of three

submodules:

1) Huffman/RLE decompression (adapted from real-world code)

2) Integer matrix multiplication (synthetic)

3) Insertion sort (adapted from real-world code)

c90lib:

Benchmarks the standard library.

I consists of two submodules:

1) Computation of lnlc-width (adapted from real-world code).

2) Peephole optimizer (simplified from real-world code).

C99 features (e.g. bool, restrict) are used where available, but not

necessary.

Modules can be disabled, as necessary due to implementation limitations.

For the future, module(s) for floating-point are planned. But the current benchmark already is quite usable.

So far, stdcbench seems to achieve the goals: benchmark a wide range of important standard C functionality, without giving too much emphasis to any particular aspect.

Here are a few scores from a STM8AF5288 at 16 Mhz:

SDCC 3.7.0 RC1 with optimization for code size (-mstm8 --opt-code-size --max-allocs-per-node 100000), binary size 20953 B:

stdcbench 0.3

stdcbench c90base score: 106

stdcbench c90lib score: 87

stdcbench final score: 193

SDCC 3.7.0 RC1 with optimization for code speed  (-mstm8--opt-code-speed --max-allocs-per-node 100000), binary size 21083 B:

stdcbench 0.3

stdcbench c90base score: 109

stdcbench c90lib score: 88

stdcbench final score: 197

IAR 3.10.1.201 with optimization for code size, binary size 24288 B:

stdcbench 0.3

stdcbench c90base score: 117

stdcbench c90lib score: 71

stdcbench final score: 188

IAR 3.10.1.201 with optimization for code speed, binary size 27268 B:

stdcbench 0.3

stdcbench c90base score: 197

stdcbench c90lib score: 100

stdcbench final score: 297

Cosmic 4.4.4 with optimization for code size, binary size 8564 B:

stdcbench 0.3

stdcbench c90base score: 116

stdcbench final score: 116

Cosmic 4.4.4 with optimization for code speed, binary size 8598 B:

stdcbench 0.3

stdcbench c90base score: 123

stdcbench final score: 123

For comparison, here is the output for a high-end 8051-derivative, the C8051F120 at 98 Mhz (compiled with SDCC 3.7.0 RC1 using -mmcs51 --model-large --stack-auto --opt-code-speed --max-allocs-per-node 10000), binary size 13282 B:

stdcbench 0.3

stdcbench c90base score: 96

stdcbench final score: 96

For Cosmic and the C8051F120, the c90lib module was disabled due to implementation limitations.� That of course results in lower scores and makes the code size uncomparable to IAR/SDCC.

These results are quite interesting when

http://www.colecovision.eu/stm8/compilers.shtml

. In particular, while SDCC is ahead in Dhrystone and Coremark scores, it apparently falls behind in stdcbench scores. On the other hand, SDCC seems to do better in code size for stdcbench.

http://www.stdcbench.org

Philipp

� For Cosmic, the problem is the lack of qsort() in the standard library. An issue that could be fixed easily, and I hope it will happen. For the C8051F120, the c90lib module runs out of stack space (the MCS-51 architecture has an 8-bit stack pointer).

#compiler #iar #cosmic #c8051 #sdcc #stm8
1 REPLY 1
Philipp Krause
Senior II
Posted on June 14, 2018 at 12:48

A few more scores and details on the benchmark can be found in the SCOPES 2018 paper:

https://dl.acm.org/citation.cfm?id=3207726

 

Philipp