cancel
Showing results for 
Search instead for 
Did you mean: 

How to get fastest instruction speed on STR9

ebrombaugh
Associate II
Posted on July 23, 2007 at 22:49

How to get fastest instruction speed on STR9

6 REPLIES 6
ebrombaugh
Associate II
Posted on May 17, 2011 at 09:45

Hi,

I'm just getting started with STR9 programming and I've got a very fundamental question: Roughly how many instructions/second can the ARM966E in the STR9 run. Assuming the following conditions:

* Max clock rate (96MHz)

* Program resides in Flash with proper wait states

* Data in SRAM

* No I/O, flash data or peripheral access

What settings would I have to use to achieve this?

The reason I ask is that I've got an STR912FW44 here, and when I set it up in a short loop to toggle a GPIO bit (3 instruction loop in Flash), the instruction rate comes out to ~25MHz. I'm running from the PLL set to 96 MHz. Seems to be kind of slow.

Any suggestions appreciated!

Eric

ebrombaugh
Associate II
Posted on May 17, 2011 at 09:45

(replying to self)

An earlier msg from mirou pointed out AN2551 which goes into a lot of detail about optimizing:

http://www.st.com/stonline/products/literature/an/13563.pdf

This suggests it should be possible to approach 96MIPS with careful setup and coding. I'll look into what's going on with my system - maybe I overlooked something.

Eric

m_j_butcher
Associate II
Posted on May 17, 2011 at 09:45

Hi

Using the application note and buffered I/O I have been able to achieve 6MHz port toggling (this corresponds to the value - 12MHz edges - advertised by ST when running at 96MHz).

At a recent ARM seminar the tutor said that the average instruction rate is typically 1.9 clocks per instruction. This would typically give about 50MIPs at 96MHz clock.

Recently I had to add a short delay loop to correct a hardware issue and I used the calculation based on looking at the assembler instructions generated, the 96MHz clock rate and the 1.9 conversion factor and the loop count that I set did indeed generate the delay that I wanted (and calculated).

I am not absolutely sure myself because I was originally expecting 96MIPs (1 clock for 1 instruction) but 50MIPs does seem more like the practical result.

Final point. I have a benchmark which was running on a Rev. D chip. The same benchmark running on a Rev.G (with imporved pipeline) was 13% faster. ST expects up to 30% speed improvement between these Revs. (which may be true but probably somewhat code dependent).

Regards

Mark

http://www.uTasker.com

mark9
Associate II
Posted on May 17, 2011 at 09:45

My experience with the GPIO hardware is that two writes to the GPIO hardware are not pipelineable. I'm pretty sure this is a limitation of the GPIO hardware (ST micro) and not the CPU (ARM). You can check this by adding NOPs to your toggling loop. It should *not* slow down the output. In otherwords, the GPIO hardware is interlocking the processor if you write too quickly to it, but you are free to do non-GPIO tasks while the GPIO hardware pipeline gets flushed.

The bottom line is that you should not use the STR91x to do bit banging.

ebrombaugh
Associate II
Posted on May 17, 2011 at 09:45

@Mark: Thanks for the info on cycles/instruction. Disappointing, but not surprising. I was under the impression that most ARM instructions were single cycle, except for branches etc. It's worth noting though that my copy of the ARM architecture ref. manua

ebrombaugh
Associate II
Posted on May 17, 2011 at 09:45

OK,

Here's the ARM document with instruction timing for the 966E:

http://www.arm.com/pdfs/DDI0240A_9ES_R2.pdf

Look in section 7 for details. Interesting thing I came across in searching was a statement that with all the pipelines etc there's no guarantee that a NOP takes any time at all!

Eric