2006-10-05 01:44 AM
2011-05-17 12:31 AM
STARM,
Do you have idea to get better perofrmance from STR9? Or what I get is already best performance? Regards,2011-05-17 12:31 AM
Hi Han,
I have tested the floating point performance STR912 @96MHz vs LPC2294 @60MHz and M16C62P @24MHz. The needed times for a floating point multiplication are listed below: STR912 : 7.5 us LPC2294 : 4.3 us M16C62P : 30 us it can be interpreted, the real clock for STR912 CPU is 32 MHz. I have tested double the clock speed and the MCU is dead. :) Regards.2011-05-17 12:31 AM
honeyman wrote:
I have tested the floating point performance STR912 @96MHz vs LPC2294 @60MHz and M16C62P @24MHz. IMHO, it is a useless test, since neither of the two ARM processors (LPC and STR) has any native floating math facility, and therefore depend on a floating point library, respective their quality. The information you provided is lacking any citation of what compiler(s) you used, so I say it again - the info you provided has little-to-no merit to anyone interested in this thread. You may want to compare integer or DSP operations (not on ARM7) and then you might come closer to any comparison with some merit. Benchmarking is not a simple, often it is not even sensible, since your design and YOUR application is the best benchmark to be had. [ This message was edited by: hARMless on 29-09-2006 17:15 ]2011-05-17 12:31 AM
''Benchmarking is not a simple, often it is not even sensible, since your design and YOUR application is the best benchmark to be had. ''
I am second to that. I never care about the benchmarking. The problem is, I check the performance of the same application run at STR7 and other ARM7 cpu vs STR9. In any case , I can not get benefit of ARM9, and 96 MHz higer clock speed. It acted like 60 Mhz other brand ARM7,but my applicaiton has lots of branches, it seems like branches cost too much with STR9. honeyman, Floating point performance should be much more better than LPC2292 if you use same compiler. Did you Enable the PQFBC? Regards, [ This message was edited by: Han on 01-10-2006 11:33 ]2011-05-17 12:31 AM
In any case , I can not get benefit of ARM9, and 96 MHz higer clock speed.
It acted like 60 Mhz other brand ARM7,but my applicaiton has lots of branches,
it seems like branches cost too much with STR9.Sure, vendors optimize core-to-memory interface for their flash process; some go for Thumb (Atmel), some go for ARM (NXP) optimization. Each vendor has a different flash performance (i.e. speed, etc.). Having said that, the flash bolted to ARM9 properly, in theory, should run faster than ARM7, in order to be SuperARM7, as ST marketing tells us. However, if the SuperARM7 means that it is a ''slow ARM9'', then you are getting actually ARM7 + DSP instructions = ARMv5TE. That is great! ...if only the price would drop to $7 or so. Overall design of STR9 is good, IMH opinion. ARM9E is much better design than ARM7, this is a fact. Maybe it is a pilot's error (users not being educated, yet, about ARM-style programming)? Branching will cost more in terms of deeper pipe that needs to be refilled, after being flushed, but the depth is not an issue - it is the speed - how fast that pipe filling happens. It's the time to hit the datasheets.
2011-05-17 12:31 AM
I contacted to some people use STR9 in their designs. They got the same results as mine. It seems like it would be very hard to find pilot god enough for STR9.
Anyway, for other reasons, (DMA, big memory, very nice ethernet MAC) I think STR9 is nice MCU too, but not good as spesified.2011-05-17 12:31 AM
ARM says (ARM DUI 0056D, page 7-23):
Start of quote: 7.7.4 ARM966E-S performance issuesThe ARM966E-S runs at peak performance when: • executing code contained in TCM • accessing data contained in TCM • the write buffer is enabled. The ARM9E core stalls: • For one cycle, when a read from data TCM immediately follows a write to data TCM. • For one cycle, when a data read from instruction TCM occurs. • For two cycles, when a data write to instruction TCM occurs. • When external memory is accessed. In this case, the number of stalled cycles depends on: �? the write buffer draining �? the external memory system. End of quote. Han said: It seems like it would be very hard to find pilot god enough for STR9.Exactly my point; I chose this metaphor so nobody's ego would get hurt. There is F-22 'Raptor' and there is Cessna 'Skyhawk'. It takes different set of pilot skills to fly either; my guess the F-22 pilot could handle the Cessna, the other way around - it might not work so well. The big question is: what is STR9?Is it 'Raptor'? With Tightly_Coupled_Memory (TCM)? - hardly. More deterministic than other ARM9s with I/D cache? - yes. Faster than ARM7? - it depends, maybe not, as I read your posts. I think NXP with LPC2300 will make STMicro with STR9 a bit nervous (co$t). Hint: half-the-price - it might become 'Tomcat'? Ladies & Gentlemen, choose you planes carefully. :)2011-05-17 12:31 AM
On 08-08-2006 at 15:19, Han wrote:
/* This function for STR9 */
void ToggleLed(void)
{
static int ledv=0;
ledv^=1;
GPIO7->DR[0x3FC] =ledv;
}
void FuncTest(void)
{
register int ToggleCounter=10000000;
volatile register int Val2=0;
....
volatile register int Val8=0;
volatile int Val9=0;
....
volatile int Val19=0;
for(;;)
{
while (ToggleCounter--)
{
++Val2;
..
++Val19;
}
Just a curious comment, may I? in ARM state (and non-FIQ mode) there are r0-r12 to work with, possibly less (I am waving hands here, don't shoot, please). The compiler will try to use those for local vars, if the vars will not fit to r0-r12, 'regs-to-stack spill' takes place. That means, the processor will access memory. Your code has 6 volatile register declarations, and eleven (11) local variables. Let's suppose the compiler allocated those vars declared with 'register' to registers (after all, this is what it is for), where the remaining 11 vars will go/come from as they are being incremented? BTW, in Thumb state it gets (even) worse, since most compilers will use only r0-r7.
2011-05-17 12:31 AM
I coded this style intentional. I was having two reasons:
1. To prevent compiler optimisatons. Without volatile , the compiler (especially RVDS) just ignore addition operations. 2. With volatile, I noticed that all the popular compiler generates almost the same code (IAR, GCC, RVDS) So, whatever compiler used, people can generate same results for same CPU so we can compare the results. You can guess my applicaiton is not just consist of increasing the local variables. Whatever the conditons or reasons or code style, I was expecting STR9 can outperform any ARM7 especially if this ARM7 is slower in clock. Anyway, I like STR9 very much due to for other reasons. Regards, [ This message was edited by: Han on 05-10-2006 14:17 ]