2018-02-16 05:52 AM
Hello
With my previous experience with Atmel 8bit MCUs, which have 1MIPS/MHz perfomance, I had exactly 1 executed instruction per systick.
Now I'm using STM32F103. I noted from datasheet that its perfomance is 1.25 DMIPS/MHz. So I wrote small assembler program, in short:
LDR param0, [R6] ; param0 receiver, R6 contains address in periph bit-bang
STR param0, [R7], #4 ; R7 contains address in SRAM bit-bangB Loop ;There's no prescalers neither for AHB not for APB1/2. I downloaded this small code in embedded SRAM, set flash latency to 0, disabled flash prefetch buffer, off all interrupts and DMA.
Then I measured how fast executes this code from SRAM. The result is that one command takes 4 systicks (branch takes 8), and actual perfomance is 0.25 MIPS/MHz.
What I did wrong? Or misunderstood?
2018-02-16 08:22 AM
First of all I checked it with simulators.
Second of all I calculated it with my oscill, connected to real device (some count of consecutive accesses to IOs and than branch, the duration of branch execution is cleary visible in my oscill)
2018-02-16 08:43 AM
First of all I checked it with simulators.
I know of no cycle-perfect simulator of STM32s out in the wild.
Second of all I calculated it with my oscill, connected to real device (some count of consecutive accesses to IOs and than branch, the duration of branch execution is cleary visible in my oscill)
How do you know how long is the branch and how much take the IOs?
I repeat, this is NOT simple.
JW
2018-02-16 08:55 AM
You're right,
my calculations based on LDR=STR=2B, so, if it is not, I was wrong.
So, in summary, there is no 1.25 MIPS and will never be, and 1.25 DMIPS much more than real MIPS. And rhetorical question is what a reason me to know perfomance in DMIPS (and see it at first page of datasheet), if I can't use this value in practice :\
2018-02-16 08:58 AM
No wait, for example:
STR &sharp1 to IO
STR &sharp0 to IO
STR &sharp1 to IO
STR &sharp0 to IO
.. (and so on few times)
B Loop
So, oscill shows me:
_| |_| |_| |___| |_| |_
^
duration of branch instuction.
Very transparent. This is a way how I measure perfomance of 8bit MCUs. So F103 gave me the same picture.
2018-02-16 09:58 AM
'
How can be DMIPS > MIPS ?'
just as one pound of apple != one pound of orange.
2018-02-16 10:07 AM
Load/Store are going to be relatively expensive and dependent on external buses
The processor is pipelined so it can dispatch one instruction per cycle, this is 'throughput' rather than execution time.
Do register-to-register math/manipulation it will work much quicker and with wider data than the old 8-bit designs.
Things like multiply and divide, especially wide ones, are going to completely spank 8-bit micros.
2018-02-16 10:09 AM
DMIPS is a measure of computation work of equivalent weight, not an instruction cycle counting task.
2018-02-16 10:10 AM
The mass is the same, but they do look and taste different...
2018-02-16 06:34 PM
Yes, it is not simple at all. I perform some test:
MOVW R0, #(1 << 12)
LDR R1, =GPIOB_BRR LDR R2, =GPIOB_BSRRBlink_Loop
STRH R0, [R2
] STRHR0
, [R1
] STRHR0
, [R2
] STRHR0
, [R1
] STRHR0
, [R2
] STRHR0
, [R1
] STRHR0
, [R2
] STRHR0
, [R1
] STRHR0
, [R2
] STRHR0
, [R1
] B Blink_LoopOther conditions are the same: no interrupts, dma, prescalers etc. Freq is 16MHz HSE (clock every 0,0625us). And I get this picture in my oscill:
Pulse [0,5 us], Pause [0,75 us], Pulse [0,5 us], Pause [0,5 us], Pulse/Pause [0,25 us] Branch pause [> 1us]
So, the fastest command takes 4 system ticks, branch takes 16 O_o
I really really really don't get it... maybe I'm just to old to figure out brand new things in this SoCs.
But what I actualy figured out, is that the SoC can not complete the task I wanted to... Very sad
2018-02-16 08:22 PM
what was the task ?
you picked a very old processor to start with.
the STM32F series have been around for over 10 years ?