STM32F40x @ 168MHz wait states and execution from RAM

arne · ‎2013-01-16

Posted on January 16, 2013 at 10:06

Hi,

our institute is currently using STM32F103 devices in a distributed control system. However we are evaluating a switch-over to STM32F40x devices for the sake of its FPU and higher clock speed at comparable current consumption (according to datasheet).

Toying with the Clock Configuration Tool (AN3988) I stumbled across the Wait States mentioned at full 168MHz clock speed. Can someone point me at relevant documentation to understand how these affect performance (who is waiting for whom and what to do about it)?

Also, suspecting the Flash may be the bottleneck (wild guess here), would running the program from RAM improve performance in general purpose applications?

Thanks in advance.

Arne

#stm32f40x

waclawek.jan · ‎2013-01-17

Posted on January 17, 2013 at 12:25

> That reminds me on that (in-)famous pseudo-benchmark the PC's in the '90 were

> advertised with. It does not only matter how many instructions a MCU can execute, but

> what it actually gets done. CoreMark is such a test.

CoreMark is no better than any other benchmark. It attempts to circumvent the obvious gotcha of optimizing compilers Dhrystone fell over, but it still has many shortcomings. One of them is that it executes only a few tight loops, i.e. it is overly optimistically optimized by instruction caches. The prime example is the ART we are discussing here, and the claim that it allows zero-wait-state-like execution from FLASH at full 168MHz (which turns to be lie through omitting ''for a particular benchmark'').

> I am not intending to run down the stm32f4.

> But in fact, most benchmark results are solely marketing instruments, and that's no invention of ST.

We all know (or should know) what is the value of benchmarks. They *are* a method of estimating performance; an experienced developer should know all the ''but''s involved.

OTOH, a benchmark for which we don't know what it does, is a pure nonsense.

> No one advertises his weak spots...

I don't care for who advertises what and where. I am an engineer. I am looking for as much information as possible.

JW

frankmeyer9 · ‎2013-01-17

Posted on January 17, 2013 at 14:08

CoreMark is no better than any other benchmark. It attempts to circumvent the obvious gotcha of optimizing compilers Dhrystone fell over, but it still has many shortcomings.

So CoreMark is already better than Dhrystone.

One of them is that it executes only a few tight loops, i.e. it is overly optimistically optimized by instruction caches. The prime example is the ART we are discussing here, and the claim that it allows zero-wait-state-like execution from FLASH at full 168MHz (which turns to be lie through omitting ''for a particular benchmark'').

This ''ART'' cache is also available for your real-world code, so I would not say ''overly optimistic''. You get overly optimistic results when benchmarking meaningless code.

Here I assess CoreMark more realistic than Dhrystone

I don't care for who advertises what and where. I am an engineer. I am looking for as much information as possible.

Maybe not long enough to have fallen for an advertisement bubble, and stumble upon hard bugs ?

Marketing people use to call there babbling ''information'', too ...

Nickname12657_O · ‎2013-01-21

Posted on January 21, 2013 at 10:51

Hi Jan,

''But then I would have much more suggestions, corrections and typo fixes for the manuals/datasheet, and it appears that ST does not care anyway.''

ST does not care?? I don't think so...

It may take time to implement your feedbacks, but be sure that they will be of some interest for us. You have just to highlight your feedbacks and describe your suggestions or corrections, then they will be taken into account, depending on the internal FIFO and priorities of course ;)

Cheers,

STOne-32

lee_trueman · ‎2013-01-21

Posted on January 21, 2013 at 11:13

If you want true performance maybe look at RX600 from Renesas, Core performance is significantly better than M3/4 and fastest 0 wait state flash on any current MCU, 100Mhz. Oh and it as FPU :)

frankmeyer9 · ‎2013-01-21

Posted on January 21, 2013 at 11:26

But Renesas shows that yes, one can make even more incomprehensible and unhelpful documentation than ST ...