cancel
Showing results for 
Search instead for 
Did you mean: 

True performance of STM32?

lspr35
Associate II
Posted on November 17, 2008 at 13:39

True performance of STM32?

#stm32
53 REPLIES 53
16-32micros
Associate III
Posted on May 17, 2011 at 12:19

Hi cosmap,

If you are using EWARM, You should configure the Options to have the highest Optimization for Speed with the maximum inlining, this should translate the GPIOC->BSRR = GPIO_Pin_7; to only one store (STR) instruction. Of course you need also to Configure the APB2 to 72MHz because here you are writing to the APB and if not scaled to the same Freq as the CPU ( some wait states should be added by the bridge ( AHB <-> APB)).

''If now I insert 10 x asm (''cmp r1,r2''); between the Set and Reset, I measure precisely 235.5 (+/- 0.2ns). This means that each of the cmp instruction executed in 11ns. This is a bit faster than expected for a single cycle execution at 72MHz but it is within range. ''

Inserting this Instruction (''cmp r1,r2'') 10 times between SET and Reset is exceuting in the Internal Registers of the Cortex-M3 without any access to the memories and it will be excecuted before the First GPIO SET is effectively done thanks to the Write buffer inside the Core. so here some cycles are hidden externally and that is why you measure 11ns , but in reality (''cmp r1,r2'') takes 1 CPU Cycle. Hope this helps you to understand what you are seeing 😉

STOne-32

cosmapa
Associate II
Posted on May 17, 2011 at 12:19

Thank you for the explanations.

The compiler outputs 3 assembly instruction to set and 3 to reset. I understand this is not optimized but I have difficulties believing that these 3 instructions would take 124ns to execute. Even if one is an access to a slow APB-mapped register, the other 2 instructions access memory or core registers. This would mean that the APB register write takes almost 100ns.

I have verified that the APB2 clock is same as main 72MHz clock.

Could you please post the code that you say will toggle at 18MHz.

Thanks

lspr35
Associate II
Posted on May 17, 2011 at 12:19

Could You please post the three assembler lines (out of the .lst file) of this C instruction (GPIOC->BSRR = GPIO_Pin_7; ( = 3 assembly instructions)?

Regards

Squonk

cosmapa
Associate II
Posted on May 17, 2011 at 12:19

Here they are. 3 to set, 3 to reset:

GPIOC->BSRR = GPIO_Pin_7;

080006EE 4805 LDR R0, [PC,#0x014] ; [0x8000704] =GPIOC_BSRR (0x40011010)

080006F0 2180 MOVS R1#0x80

080006F2 6001 STR R1, [R0, #0]

GPIOC->BRR = GPIO_Pin_7;

080006F4 4804 LDR R0, [PC,#0x010] ; [0x8000708] =GPIOC_BRR (0x40011014)

080006F6 2180 MOVS R1#0x80

080006F8 6001 STR R1, [R0, #0]

Thank you for taking the time to help

roger7
Associate II
Posted on May 17, 2011 at 12:19

This may work, too:

http://www.intel.com/products/processor/core2quad/index.htm?iid=homepage+qc

🙂

lspr35
Associate II
Posted on May 17, 2011 at 12:19

@cosmapa:

hm,... that is a little bit strange to me: all instructions are 16bit instructions - so the flash access time can not be the limiting factor. I will try to think about these instructions.

Regards

Squonk

cosmapa
Associate II
Posted on May 17, 2011 at 12:19

Could it be that the program is running in debug mode? I read that in this mode, changes to peripheral registers are captured using extra hidden steps. Where is that defined?

16-32micros
Associate III
Posted on May 17, 2011 at 12:19

We need to disable the debug mode in the library : this is defined in stm32f10x_conf.h file, if the DEBUG is not commented you should do it.

ivan239955_stm1_st
Associate II
Posted on May 17, 2011 at 12:19

Question to STone-32:

I did couple of profiling and it looks that STM32 core behave differently during memory loads /stores from what is described in Cortex-M3 Technical Reference Manual. Namely:

STR Rx,[Ry,#imm] should be 1 cycle. In STM32 it is always at least 2 cycles

LDR/LDR LDR/STR pipelinig. In multiple consequent LDR, first one should be 2 cycles, all others 1 cycle. In STM32 all LDR/STR are at least 2 cycles.

What can be reason for this? Maybe DMA - even if not running, 50% of RAM bandwidth still reserved for DMA? Or is store buffer in M3 core disabled?

If STM32 core behaves as described, maximum toggle frequency should be 36 MHz (= 72 MHz update rate, assuming 72 MHz APB2 clock)

Hopefully there is simple explanation, such us some core register setting.

Any ideas?

cosmapa
Associate II
Posted on May 17, 2011 at 12:19

Using the IAR tool, The 18MHz example from STOne-32 still compiled into three ASM instructions for each *(vu32*)(0x40011014)= 0x00000080; . Toggling was still 125ns high/125ns low.

Compiling with the Raisonance tool produced the one instruction per statement after the first one. Toggling speed was the expected 18MHz.

I then recompiled my test that used

GPIOC->BSRR = GPIO_Pin_7;

GPIOC->BRR = GPIO_Pin_7;

Compiler produced 3 ASM instruction for the first, 2 ASM instruction for the second (was 3 and 3 for IAR)

Pin stays high for 27.3ns vs 124.4ns.

This difference cannot be accounted for by the extra ASM produced by the IAR compiler. The DEBUG flag is not set in the c code. Maybe someone from IAR could help shed some light.

As far as I am concerned, I am glad to see the STM32 perform so well.

Thank you to all for your help