2007-07-23 01:49 PM
How to get fastest instruction speed on STR9
2011-05-17 12:45 AM
Hi,
I'm just getting started with STR9 programming and I've got a very fundamental question: Roughly how many instructions/second can the ARM966E in the STR9 run. Assuming the following conditions: * Max clock rate (96MHz) * Program resides in Flash with proper wait states * Data in SRAM * No I/O, flash data or peripheral access What settings would I have to use to achieve this? The reason I ask is that I've got an STR912FW44 here, and when I set it up in a short loop to toggle a GPIO bit (3 instruction loop in Flash), the instruction rate comes out to ~25MHz. I'm running from the PLL set to 96 MHz. Seems to be kind of slow. Any suggestions appreciated! Eric2011-05-17 12:45 AM
(replying to self)
An earlier msg from mirou pointed out AN2551 which goes into a lot of detail about optimizing: http://www.st.com/stonline/products/literature/an/13563.pdf This suggests it should be possible to approach 96MIPS with careful setup and coding. I'll look into what's going on with my system - maybe I overlooked something. Eric2011-05-17 12:45 AM
Hi
Using the application note and buffered I/O I have been able to achieve 6MHz port toggling (this corresponds to the value - 12MHz edges - advertised by ST when running at 96MHz). At a recent ARM seminar the tutor said that the average instruction rate is typically 1.9 clocks per instruction. This would typically give about 50MIPs at 96MHz clock. Recently I had to add a short delay loop to correct a hardware issue and I used the calculation based on looking at the assembler instructions generated, the 96MHz clock rate and the 1.9 conversion factor and the loop count that I set did indeed generate the delay that I wanted (and calculated). I am not absolutely sure myself because I was originally expecting 96MIPs (1 clock for 1 instruction) but 50MIPs does seem more like the practical result. Final point. I have a benchmark which was running on a Rev. D chip. The same benchmark running on a Rev.G (with imporved pipeline) was 13% faster. ST expects up to 30% speed improvement between these Revs. (which may be true but probably somewhat code dependent). Regards Mark2011-05-17 12:45 AM
My experience with the GPIO hardware is that two writes to the GPIO hardware are not pipelineable. I'm pretty sure this is a limitation of the GPIO hardware (ST micro) and not the CPU (ARM). You can check this by adding NOPs to your toggling loop. It should *not* slow down the output. In otherwords, the GPIO hardware is interlocking the processor if you write too quickly to it, but you are free to do non-GPIO tasks while the GPIO hardware pipeline gets flushed.
The bottom line is that you should not use the STR91x to do bit banging.2011-05-17 12:45 AM
@Mark: Thanks for the info on cycles/instruction. Disappointing, but not surprising. I was under the impression that most ARM instructions were single cycle, except for branches etc. It's worth noting though that my copy of the ARM architecture ref. manua
2011-05-17 12:45 AM
OK,
Here's the ARM document with instruction timing for the 966E: http://www.arm.com/pdfs/DDI0240A_9ES_R2.pdf Look in section 7 for details. Interesting thing I came across in searching was a statement that with all the pipelines etc there's no guarantee that a NOP takes any time at all! Eric