AnsweredAssumed Answered

Understanding STM8 pipelining

Question asked by rumpeltux . on Oct 9, 2017

I’m trying to understand STM8 pipelining to be able to predict how much cycles my code will need.
(I already asked this on stackoverflow, but I figured this focum is probably the better audience).

I have this example, where I toggle a GPIO pin for 4 cycles each. Iff loop is aligned at 4byte-boundary + 3, the pin stays active for 5 cycles (i.e. one more than it should). I wonder why?

// Switches port D2, 5 cycles high, 4 cycles low
void main(void) {
         bset 0x5011, #2 ; output mode
         bset 0x5012, #2 ; push-pull
         bset 0x5013, #2 ; fast switching

         jra _loop
     .bndry 4
         bset 0x500f, #2
         bres 0x500f, #2
         jra _loop

A bit more context:

  • bset/bres are 4 byte instructions, nop 1 byte.
  • The nop/bset/bres instructions take 1 cycle each.
  • The jra instruction takes two cycles. I think in the first cycle, the instruction cache is filled with the next 32bit value, i.e. in this case the nop instruction only. And the 2nd cycle is actually just the CPU being stalled while decoding the next instruction.

So in cycles:

  1. bres clears the pin
  2. jra, pipeline flush, nop fetch
  3. nop decode, bset fetch
  4. nop execute, bset decode, next nop fetch
  5. bset execute sets the pin
  6. nop, bres fetch
  7. nop
  8. nop, bres decode
  9. bres execute clears the pin

According to this, the pin should stay LOW for 4 cycles and HIGH for 4 cycles, but it’s staying HIGH for 5 cycles.

In any other alignment case, the pin is LOW/HIGH for 4 cycles as expected.

I think, if the PIN stays high for an extra cycle that must mean that the execution pipeline is stalled after the bset instruction (the nops thereafter provide enough time to make sure that bres later is ready to execute immediately). But according to my understanding nop (for 6.) would already be fetched in 4.

Any idea how this behavior can be explained? I couldn’t find any hints in the manual.