2008-11-17 04:39 AM
True performance of STM32?
#stm322011-05-17 03:19 AM
Hi cosmap,
If you are using EWARM, You should configure the Options to have the highest Optimization for Speed with the maximum inlining, this should translate the GPIOC->BSRR = GPIO_Pin_7; to only one store (STR) instruction. Of course you need also to Configure the APB2 to 72MHz because here you are writing to the APB and if not scaled to the same Freq as the CPU ( some wait states should be added by the bridge ( AHB <-> APB)). ''If now I insert 10 x asm (''cmp r1,r2''); between the Set and Reset, I measure precisely 235.5 (+/- 0.2ns). This means that each of the cmp instruction executed in 11ns. This is a bit faster than expected for a single cycle execution at 72MHz but it is within range. '' Inserting this Instruction (''cmp r1,r2'') 10 times between SET and Reset is exceuting in the Internal Registers of the Cortex-M3 without any access to the memories and it will be excecuted before the First GPIO SET is effectively done thanks to the Write buffer inside the Core. so here some cycles are hidden externally and that is why you measure 11ns , but in reality (''cmp r1,r2'') takes 1 CPU Cycle. Hope this helps you to understand what you are seeing ;) STOne-322011-05-17 03:19 AM
Thank you for the explanations.
The compiler outputs 3 assembly instruction to set and 3 to reset. I understand this is not optimized but I have difficulties believing that these 3 instructions would take 124ns to execute. Even if one is an access to a slow APB-mapped register, the other 2 instructions access memory or core registers. This would mean that the APB register write takes almost 100ns. I have verified that the APB2 clock is same as main 72MHz clock. Could you please post the code that you say will toggle at 18MHz. Thanks2011-05-17 03:19 AM
Could You please post the three assembler lines (out of the .lst file) of this C instruction (GPIOC->BSRR = GPIO_Pin_7; ( = 3 assembly instructions)?
Regards Squonk2011-05-17 03:19 AM
Here they are. 3 to set, 3 to reset:
GPIOC->BSRR = GPIO_Pin_7; 080006EE 4805 LDR R0, [PC,#0x014] ; [0x8000704] =GPIOC_BSRR (0x40011010) 080006F0 2180 MOVS R1#0x80 080006F2 6001 STR R1, [R0, #0] GPIOC->BRR = GPIO_Pin_7; 080006F4 4804 LDR R0, [PC,#0x010] ; [0x8000708] =GPIOC_BRR (0x40011014) 080006F6 2180 MOVS R1#0x80 080006F8 6001 STR R1, [R0, #0] Thank you for taking the time to help2011-05-17 03:19 AM
This may work, too:
http://www.intel.com/products/processor/core2quad/index.htm?iid=homepage+qc
:)2011-05-17 03:19 AM
@cosmapa:
hm,... that is a little bit strange to me: all instructions are 16bit instructions - so the flash access time can not be the limiting factor. I will try to think about these instructions. Regards Squonk2011-05-17 03:19 AM
Could it be that the program is running in debug mode? I read that in this mode, changes to peripheral registers are captured using extra hidden steps. Where is that defined?
2011-05-17 03:19 AM
2011-05-17 03:19 AM
Question to STone-32:
I did couple of profiling and it looks that STM32 core behave differently during memory loads /stores from what is described in Cortex-M3 Technical Reference Manual. Namely: STR Rx,[Ry,#imm] should be 1 cycle. In STM32 it is always at least 2 cycles LDR/LDR LDR/STR pipelinig. In multiple consequent LDR, first one should be 2 cycles, all others 1 cycle. In STM32 all LDR/STR are at least 2 cycles. What can be reason for this? Maybe DMA - even if not running, 50% of RAM bandwidth still reserved for DMA? Or is store buffer in M3 core disabled? If STM32 core behaves as described, maximum toggle frequency should be 36 MHz (= 72 MHz update rate, assuming 72 MHz APB2 clock) Hopefully there is simple explanation, such us some core register setting. Any ideas?2011-05-17 03:19 AM
Using the IAR tool, The 18MHz example from STOne-32 still compiled into three ASM instructions for each *(vu32*)(0x40011014)= 0x00000080; . Toggling was still 125ns high/125ns low.
Compiling with the Raisonance tool produced the one instruction per statement after the first one. Toggling speed was the expected 18MHz. I then recompiled my test that used GPIOC->BSRR = GPIO_Pin_7; GPIOC->BRR = GPIO_Pin_7; Compiler produced 3 ASM instruction for the first, 2 ASM instruction for the second (was 3 and 3 for IAR) Pin stays high for 27.3ns vs 124.4ns. This difference cannot be accounted for by the extra ASM produced by the IAR compiler. The DEBUG flag is not set in the c code. Maybe someone from IAR could help shed some light. As far as I am concerned, I am glad to see the STM32 perform so well. Thank you to all for your help