Slow GPIO output

johnsim · ‎2005-09-21

Posted on September 21, 2005 at 11:56

Slow GPIO output

johnsim · ‎2005-09-20

Posted on September 20, 2005 at 11:40

Hi,

tring to ensure that I've got the chip running as fast as possible. RCCU_FrequencyValue (RCCU_MCLK) reports back at 48MHz. I've got a simple test:

while(1)

{

GPIO_WordWrite(GPIO0, 0x0000);

GPIO_WordWrite(GPIO0, 0xFFFF);

}

This should create a square wave output and with the chip only doing a few instructions, it should be in the 10s of MHz. However, maximum I'm getting is 1.33MHz.

What am I doing wrong?

Using IAR Embedded Workbench with the J-Link (Segger) ICE Link.

Thanks,

John.

rsherry · ‎2005-09-20

Posted on September 20, 2005 at 12:44

John,

Did you check the assembly generated and count the instruction cycles? This might give you a clue as to what frequency you really should expect to see your GPIO's toggle. I think using the STR7x Library has to dereference at least one pointer to access the port, etc. etc.

To do your test you should probably use a couple of assembly instructions instead of the GPIO library so that you are sure that it will execute as fast as possible.

Best regards,

Ryan.

johnsim · ‎2005-09-20

Posted on September 20, 2005 at 13:19

Hi Ryan,

this is what I can't understand, the assembly is quite efficient and not using 50 cycles from what I can see:

GPIO_WordWrite(GPIO0, 0x0000);

00000BA4 E3A01000 MOV R1, #0x0

00000BA8 E3A004E0 MOV R0, #0xE0000000

00000BAC E3800DC0 ORR R0, R0, #0x3000

00000BB0 EBFFFF80 BL GPIO_WordWrite ; 0x9B8

GPIO_WordWrite(GPIO0, 0xFFFF);

00000BB4 E3A010FF MOV R1, #0xFF

00000BB8 E3811CFF ORR R1, R1, #0xFF00

00000BBC E3A004E0 MOV R0, #0xE0000000

00000BC0 E3800DC0 ORR R0, R0, #0x3000

00000BC4 EBFFFF7B BL GPIO_WordWrite ; 0x9B8

00000BC8 EAFFFFF5 B 0x000BA4

Any microcontroller should be able to do this very well, I need to have quite fast external control to generate and access memory. I'm not moving data in to the ARM, but need it to create addresses and access control. Hence, want fast GPIO.

Any help guys?

All the best,

John.

rsherry · ‎2005-09-20

Posted on September 21, 2005 at 02:51

Hi John,

Actually, 50 cycles looks to be probably pretty close to what you should expect. I'm not an ARM assembly expert by any stretch, but I believe the MOV and ORR instructions will be 1 cycle each. The BL instruction is a branch with link instruction (i.e. a function call like the CALL or JSR instructions of many other instruction sets), and each one of these will cost you 3 clock cycles. You didn't include the assembly generated for the GPIO_WordWrite function, but I'm guessing it will be another 2 or 3 clock cycles, plus the probable pipeline flush when you return costing you another couple of cycles.

So you are looking at:

7 cycles total for the MOV, ORR instructions.

9 cycles total for the 2 BL and 1 B instructions.

Plus 2X the cycles required in the GPIO_WordWrite function.

Plus any pipeline flushes as a consequence of the B and BL's.

Also, you should consider what memory space you are running from and whether or not there could be any stalls on memory accesses...for example if you are fetching instructions from a slower external SRAM etc. There could be some extra cycles that aren't obvious by just looking at the code generated.

Overall, I think the function calls are where you are losing most of your time. Depending on how you need to do the I/O writes you could make major speed improvements by eliminating the function calls and simply writing the memory mapped GPIO peripheral addresses...for example:

GPIO0->PD = 0x0000;

GPIO0->PD = 0xFFFF;

will certainly be faster than calling GPIO_WordWrite(). Since the address of GPIO0 has to be loaded into a register and then the offset to the PD must be added, you could even save one more instruction by pointing directly at the PD register with your code rather than using the GPIO structure...

*(volatile u16 *)0xE000300C = 0x0000;

*(volatile u16 *)0xE000300C = 0xFFFF;

This will execute even a wee bit faster. Take a look at the difference in the compiler output for these lines of code. This should be much closer to what you expect. Keep in mind that the STR7x Library is a nice and handy way to play with the peripherals and learn how to program them quickly, but in general it will not be the most efficient/fastest code...I even think the user manual to Library makes a similar disclaimer in the first couple of pages.

Hopefully I've given you a few things to think about. Let me know if I can help understand your specific problem better.

Best regards,

Ryan.

johnsim · ‎2005-09-21

Posted on September 21, 2005 at 11:56

Hi Ryan,

that's certainly better, but not quick as I'd have expected. I'll have to do more of my application to see if it will do the job or not. Also, changing the optimisation of the complier does help too.

I guess the real problem here is that you've got to keep the pipeline stocked full of code to execute. Only when I've got more code will I then get an idea as I've got to do some calculations first before accessing external memory.

All the code is to execute from internal memory. It will generate addresses for the external memory and it will be a direct transfer of data from my PLD (which is using some neat VHDL code) to memory and back. Means that I don't need to take data in and out of the ARM chip, which it's not supposed to be good at.

All the best,

John.