How to obtain 1.25 MIPS

Roman K · ‎2018-02-16

Posted on February 16, 2018 at 14:52

Hello

With my previous experience with Atmel 8bit MCUs, which have 1MIPS/MHz perfomance, I had exactly 1 executed instruction per systick.

Now I'm using STM32F103. I noted from datasheet that its perfomance is 1.25 DMIPS/MHz. So I wrote small assembler program, in short:

LDR param0, [R6] ; param0 receiver, R6 contains address in periph bit-bang

STR param0, [R7], #4 ; R7 contains address in SRAM bit-bang

B Loop ;

There's no prescalers neither for AHB not for APB1/2. I downloaded this small code in embedded SRAM, set flash latency to 0, disabled flash prefetch buffer, off all interrupts and DMA.

Then I measured how fast executes this code from SRAM. The result is that one command takes 4 systicks (branch takes 8), and actual perfomance is 0.25 MIPS/MHz.

What I did wrong? Or misunderstood?

Roman K · ‎2018-02-16

Posted on February 16, 2018 at 16:22

First of all I checked it with simulators.

Second of all I calculated it with my oscill, connected to real device (some count of consecutive accesses to IOs and than branch, the duration of branch execution is cleary visible in my oscill)

waclawek.jan · ‎2018-02-16

Posted on February 16, 2018 at 16:43

First of all I checked it with simulators.

I know of no cycle-perfect simulator of STM32s out in the wild.

Second of all I calculated it with my oscill, connected to real device (some count of consecutive accesses to IOs and than branch, the duration of branch execution is cleary visible in my oscill)

How do you know how long is the branch and how much take the IOs?

I repeat, this is NOT simple.

JW

Roman K · ‎2018-02-16

Posted on February 16, 2018 at 16:55

You're right,

my calculations based on LDR=STR=2B, so, if it is not, I was wrong.

So, in summary, there is no 1.25 MIPS and will never be, and 1.25 DMIPS much more than real MIPS. And rhetorical question is what a reason me to know perfomance in DMIPS (and see it at first page of datasheet), if I can't use this value in practice :\

Roman K · ‎2018-02-16

Posted on February 16, 2018 at 16:58

No wait, for example:

STR &sharp1 to IO

STR &sharp0 to IO

STR &sharp1 to IO

STR &sharp0 to IO

.. (and so on few times)

B Loop

So, oscill shows me:

_| |_| |_| |___| |_| |_

^

duration of branch instuction.

Very transparent. This is a way how I measure perfomance of 8bit MCUs. So F103 gave me the same picture.

henry.dick · ‎2018-02-16

Posted on February 16, 2018 at 17:58

'

How can be DMIPS > MIPS ?'

just as one pound of apple != one pound of orange.

Tesla DeLorean · ‎2018-02-16

Posted on February 16, 2018 at 18:07

Load/Store are going to be relatively expensive and dependent on external buses

The processor is pipelined so it can dispatch one instruction per cycle, this is 'throughput' rather than execution time.

Do register-to-register math/manipulation it will work much quicker and with wider data than the old 8-bit designs.

Things like multiply and divide, especially wide ones, are going to completely spank 8-bit micros.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Tesla DeLorean · ‎2018-02-16

Posted on February 16, 2018 at 18:09

DMIPS is a measure of computation work of equivalent weight, not an instruction cycle counting task.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Tesla DeLorean · ‎2018-02-16

Posted on February 16, 2018 at 18:10

The mass is the same, but they do look and taste different...

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Roman K · ‎2018-02-16

Posted on February 17, 2018 at 03:34

Yes, it is not simple at all. I perform some test:

MOVW R0, #(1 << 12)

LDR R1, =GPIOB_BRR

LDR R2, =GPIOB_BSRR

Blink_Loop

STRH R0, [

R2

]

STRH

R0

, [

R1

]

STRH

R0

, [

R2

]

STRH

R0

, [

R1

]

STRH

R0

, [

R2

]

STRH

R0

, [

R1

]

STRH

R0

, [

R2

]

STRH

R0

, [

R1

]

STRH

R0

, [

R2

]

STRH

R0

, [

R1

]

B Blink_Loop

Other conditions are the same: no interrupts, dma, prescalers etc. Freq is 16MHz HSE (clock every 0,0625us). And I get this picture in my oscill:

Pulse [0,5 us], Pause [0,75 us], Pulse [0,5 us], Pause [0,5 us], Pulse/Pause [0,25 us] Branch pause [> 1us]

So, the fastest command takes 4 system ticks, branch takes 16 O_o

I really really really don't get it... maybe I'm just to old to figure out brand new things in this SoCs.

But what I actualy figured out, is that the SoC can not complete the task I wanted to... Very sad

T J · ‎2018-02-16

Posted on February 17, 2018 at 04:22

what was the task ?

you picked a very old processor to start with.

the STM32F series have been around for over 10 years ?