With my previous experience with Atmel 8bit MCUs, which have 1MIPS/MHz perfomance, I had exactly 1 executed instruction per systick.
Now I'm using STM32F103. I noted from datasheet that its perfomance is 1.25 DMIPS/MHz. So I wrote small assembler program, in short:
LDR param0, [R6] ; param0 receiver, R6 contains address in periph bit-bang
STR param0, [R7], #4 ; R7 contains address in SRAM bit-bang
B Loop ;
There's no prescalers neither for AHB not for APB1/2. I downloaded this small code in embedded SRAM, set flash latency to 0, disabled flash prefetch buffer, off all interrupts and DMA.
Then I measured how fast executes this code from SRAM. The result is that one command takes 4 systicks (branch takes 8), and actual perfomance is 0.25 MIPS/MHz.
What I did wrong? Or misunderstood?