2008-11-17 04:39 AM
True performance of STM32?
#stm322011-05-17 03:19 AM
Hi
Your result is very interesting. Do You know if the STR in Your previous example is a 16bit or 32bit instruction? Regards Squonk2011-05-17 03:19 AM
Hello,
I made a similar test with 40 consecutive asm() instructions with the following results: asm(''adds r1, r2, #1'') with 2 waitstates: approx. 16ns asm(''adds r1, r2, #1'') with 1 waitstate: approx. 14ns asm(''mul r1, r2, r3'') with 2 waitstates: approx. 23ns asm(''mul r1, r2, r3'') with 1 waitstate: approx. 16ns Obviously the 14ns can be achieved by 16bit instructions (adds) whereas the 32bit instruction (mul) takes 1,5 times 14ns. My time measurements are not very precise since I do not have an oscilloscope available in the moment. Additionally I think that other effects also have an impact on instruction execution times (e.g. RAM access, ...). A detailed information about instruction execution times would be very interesting. The ST documentation is not very verbose on this topic. Is it possible to find better information in the CORTEX documentation? Regards Squonk2011-05-17 03:19 AM
I think that it is difficult to measure instruction timing unless you use assembler. With C compiler you don't know what you get, unless you disassemble it.
Most Cortex-M3 instructions are executed in 1 cycle. STR and LDR are usually 2 cycles (1 when pipelined e.g. more LDR or STR instructions in sequence) STR Rx,[Ry,#imm] is always one cycle. Few notes about documents with usefull instruction timing/description: CortexM3_TechRefMan_r1p1_trm.pdf (infocenter.arm.com/help/topic/com.arm.doc.ddi0337e/DDI0337E_cortex_m3_r1p1_trm.pdf ) most usefull thing here is instruction timing: pages 18-1 to 18-8 DDI0405B_arm_v7m_architecture_app_level_reference_manual.pdf (infocenter.arm.com/help/topic/com.arm.doc.ddi0405b/index.html ) contains detailed description of every instruction pages A6-1 to A6-276 hopefully this helps... Ivan2011-05-17 03:19 AM
GPIOA->BSRR = GPIO_Pin_10;
GPIOA->BRR = GPIO_Pin_10; compile in 3 instructions each using the IAR toolset. Which compiler/toolset are you using to get each statement compiled in a single ARM instruction, as stated in the earlier post? Thanks2011-05-17 03:19 AM
Hi Mr Swiss.
Remember that it was one STR instruction per C-line when *repeatedly* used (20 lines or so) as I described earilier in the post. Guess which compiler/toolchain I use! (the one reccomended by ST)2011-05-17 03:19 AM
Actually it seems ST is recommending a bundle of toolchains, I saw when I did a relook.
I use ST/Raissonance Rlink with RIDE 7 and gcc.2011-05-17 03:19 AM
Dear all,
Today, the best Compiler for STM32 ( Cortex-M3) is the ARM/Keil One for Speed execution using Agressive time optimization ( -Otime -O2 or -O3), For GPIO Toggle speed we can reach up to 18MHz of Pin Toggle at Flash with 2 wait states and running at 72MHz with a code like : STR ( 2x CPU cycles) STR ( 2x CPU cycles) ... ... If you insert a Jump or a branch , of course it will take about 3 cycles in a raw and stalls/flushes the 3 stage/pipeline of the Core and the Flash accelerator. To increase the performance for such code you can force inlining and loop unrolling using compiler options , Thank you :) STOne-322011-05-17 03:19 AM
Quote:
I use ST/Raissonance Rlink with RIDE 7 and gcc. You're from Sweden and you don't use IAR's EWARM ? Shame on you ! :-]2011-05-17 03:19 AM
Hi Arm_Wrestler,all
IAR EWARM is also providing a very good executing code speed vs size. For GNU , Here an interesting discussion :)2011-05-17 03:19 AM
I am still at loss with the slow GPIO toggle speed that I see.
If I set and reset bit 7 of port C, with the code below GPIOC->BSRR = GPIO_Pin_7; ( = 3 assembly instructions) GPIOC->BSR = GPIO_Pin_7; ( = 3 assembly instructions) I measure a pulse that stays high for precisely 124.4ns (+/- 0.2ns) If now I insert 10 x asm (''cmp r1,r2''); between the Set and Reset, I measure precisely 235.5 (+/- 0.2ns). This means that each of the cmp instruction executed in 11ns. This is a bit faster than expected for a single cycle execution at 72MHz but it is within range. The part that confuses me is the 124ns for executing the 3 instructions that reset the GPIO bit. This is a huge discrepancy. Where could this come from?