cancel
Showing results for 
Search instead for 
Did you mean: 

Instruction cycles and gpio access

klaasdc
Associate II
Posted on January 13, 2013 at 18:58

I wanted to see how fast the gpio pins can be 'pulsed', so I created the following piece of code:


/* Switch to 16Mhz Crystal Osc clock */

CLK_ClockSwitchConfig(CLK_SWITCHMODE_AUTO, CLK_SOURCE_HSE, DISABLE, DISABLE);

/* Output Fcpu on CLK_CCO pin */

CLK_CCOConfig(CLK_OUTPUT_CPU);


GPIO_DeInit(GPIOD);

GPIO_Init(GPIOD, GPIO_PIN_3, GPIO_MODE_OUT_PP_LOW_FAST)


while
(1){

GPIOD->ODR |= (uint8_t)(GPIO_PIN_3);

GPIOD->ODR &= (uint8_t)(~GPIO_PIN_3);

}

Then I connected a scope on the output of GPIO D3 and the CCO output: 0690X00000604zCQAQ.png As you can see, about 6 cycles are needed to turn a GPIO on and off, resulting in 16Mhz/6 = 2.6667 Mhz for the GPIO signal. I found this a bit slow, as the code translates to the following instructions (I have used writeHigh and writeLow as macros for the GPIO->ODR access):

main.c:107 writeHigh(GPIOD, GPIO_PIN_3); 
0x81bb <main+40> 0x7216500F BSET 0x500f,#3 BSET 0x500f,#3 
main.c:108 writeLow(GPIOD, GPIO_PIN_3); 
0x81bf <main+44> 0x7217500F BRES 0x500f,#3 BRES 0x500f,#3 
0x81c3 <main+48> 0x20F6 JRT 0x81bb JRT 0x81bb 

So in fact only 1 instruction (BSET or BRES) is needed to control a GPIO pin. According to the CPU programming manual PM0044 p.70, these instructions have a 4 byte length and execute in 1 cycle. Why would it take 2 cycles in the above example? I don't understand...
5 REPLIES 5
fggnrc2
Associate II
Posted on January 14, 2013 at 08:54

The core takes more time because PM0044 shows the number of clock cycles that an instruction takes when the core executes an instruction sequence that is long enough to fill its pipeline. When there are jumps, this pipeline is flushed so that it takes more time to execute the next instruction.

Loop unrolling is a common way to decrease pipeline flush penalties:

while

(1){

GPIOD->ODR |= (uint8_t)(GPIO_PIN_3);

GPIOD->ODR &= (uint8_t)(~GPIO_PIN_3);

GPIOD->ODR |= (uint8_t)(GPIO_PIN_3);

 

GPIOD->ODR &= (uint8_t)(~GPIO_PIN_3); GPIOD->ODR |= (uint8_t)(GPIO_PIN_3);

 

GPIOD->ODR &= (uint8_t)(~GPIO_PIN_3); 

GPIOD->ODR |= (uint8_t)(GPIO_PIN_3);

 

GPIOD->ODR &= (uint8_t)(~GPIO_PIN_3); }

In this case, loop unrolling may give no improvement, because there is a structural limit.

When a bit is toggled, its value must be read and the new one must be written.

This operation needs two bus access cycles and they can't be decreased.

To speed pulsing, bus access cycles must be decreased:

while

(1){

GPIOD->ODR = 8;

GPIOD->ODR = 0;

}

but it clears the other bits too, so it isn't what one usually wants...

EtaPhi

BTW. There is a shorter way to toggle a bit, but I don't know when a C compiler outputs it:

ToggleLoop

BCPL 0x500F,#3

JRA ToggleLoop

klaasdc
Associate II
Posted on January 14, 2013 at 10:10

Thanks for the informative reply!

Does that mean that the core has some sort of ''prefetch'' for the BSET instruction that inserts a memory read?

And, how can the BCPL instruction be faster; it would also need to read the bit value before being able to complement it, right?

fggnrc2
Associate II
Posted on January 14, 2013 at 14:11

The STM8 core has got an harvard architecture with a 32 bit code bus and 8 bit data bus.

For this reason a flash read cycle fills four bytes of its 64 bit pre-fetch queue.

One instruction is immediately decoded, while the other bytes are available for future uses.

When a jump instruction changes the instruction flow, the pre-fetch queue content is useless, but it's immediately restored by the following flash read.

Chapter 5 of PM0044 gives more details on the STM8 core.

As regards BCPL instruction, it decreases only code size, because it needs only 4 bytes, while 8 bytes are needed to store a BSET/BRES pair.

Code alignment plays some role, because the pre-fetch queue is 64 bit wide, so it may store a BCPL/JRA pair when it's properly aligned.

I usually ignore these are small details because almost never my code is time critical...

luca239955_stm1_st
Senior II
Posted on January 15, 2013 at 16:24

BTW. There is a shorter way to toggle a bit, but I don't know when a C compiler outputs it:

ToggleLoop

    BCPL 0x500F,#3

    JRA ToggleLoop

Example:

volatile char PA_DR         @0x00;

volatile _Bool PA3 @PA_DR:3;

void prova(void) {

        PA3 ^= 1;

        PA3 = ~PA3;

}

klaasdc
Associate II
Posted on January 16, 2013 at 09:30

Cool, I'll give that a try!