STM32F0Discovery timing issue

chrispadbury · ‎2012-08-29

Posted on August 29, 2012 at 17:11

I am trying to have a simple millisecond delay function and experiencing some very strange results. I am using Keil uVision V4.00.0. Here is my code:

#include ''stm32f0_discovery.h'' 
//prototypes 
void Delay_ms(long ms); 
//configure clocks 
void RCC_Configuration(void) 
{ 
/* --------------------------- System Clocks Configuration -----------------*/ 
/* GPIOC clock enable */ 
RCC_AHBPeriphClockCmd(RCC_AHBPeriph_GPIOC, ENABLE); 
} 
//configure GPIO 
void GPIO_Configuration(void) 
{ 
GPIO_InitTypeDef GPIO_InitStructure; 
/*-------------------------- GPIO Configuration ----------------------------*/ 
GPIO_InitStructure.GPIO_Pin = GPIO_Pin_6 | GPIO_Pin_7 | GPIO_Pin_8 | GPIO_Pin_9; 
GPIO_InitStructure.GPIO_Mode = GPIO_Mode_OUT; 
GPIO_InitStructure.GPIO_OType = GPIO_OType_PP; 
GPIO_InitStructure.GPIO_PuPd = GPIO_PuPd_UP; 
GPIO_InitStructure.GPIO_Speed = GPIO_Speed_50MHz; 
GPIO_Init(GPIOC, &GPIO_InitStructure); 
} 
int main(void) { 
RCC_Configuration(); 
GPIO_Configuration(); 
while(1) { 
GPIO_ResetBits(GPIOC, GPIO_Pin_9); 
GPIO_SetBits(GPIOC, GPIO_Pin_8); 
Delay_ms(5); 
GPIO_ResetBits(GPIOC, GPIO_Pin_8); 
GPIO_SetBits(GPIOC, GPIO_Pin_9); 
Delay_ms(5); 
} 
} 
void Delay_ms(long ms) { 
//ms = ms * 48000 / 4; 
ms = ms * 38400 / 4; 
//ms = 48000; 
while (ms) { 
ms--; 
} 
}

This creates an 33Hz squarewave on pin PC8 (measured using scope). I had expected to use a value of 48000 rather than 38400 but that gave an 80Hz squarewave. Iam trying to create 100Hz, which is why I changed it to 38400 expecting to increase the frequency by 25%. I have tried some different lines and get very stange behaviour:

| Multiplier |

Frequency(Hz) |

debug starting Hex |

debug starting Dec|

Clk per loop|

48000

00

5

24000

00

5

12000

00

5

20000

00

5

38400

33

BB80

48000

6

38402

00

BB82

48002

5

38401

00

BB81

48001

5

38399

00

BB7E

47998

5

What's so special about ''ms = ms * 38400 / 4''? I had expected the loop to take 4 clock cycles (hence starting with ''ms = ms * 48000 / 4''). Next I tried hardwiring the starting values, like using ''ms = 48000''. 48,000 created 125Hz and 60,000 created 100Hz. So that's 4 clocks per cycle. looking at the debug the starting Hex in the first case is BB80! I'm using Level 3 (O3) optimization under Keil. Any ideas why I am seeing such strange behaviour? #loop-timing #keil-uvision #stm32f051

Tesla DeLorean · ‎2012-08-29

Posted on August 29, 2012 at 18:34

Software delay loops are notoriously difficult to control. And might also be impacted by flash line alignment, and prefetch mechanisms.

You'd need to review the assembler to see what the compiler has done with the code, or if it has unwound the loop. Having an index variable as ''volatile'' will also impact what the optimizer will do.

I haven't looked to see if the F0 has a core cycle counter or not. I'd use TIM PWM to generate accurate and consistent pulses.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

chrispadbury · ‎2012-08-30

Posted on August 30, 2012 at 13:28

Thanks clive1. I set up the squarewave to calibrate my Delay_ms sub-routine as I would like to use it for other delays in F0 codes.

I've reviewed the assembler in Keil debug and the relevant sections are below. I see that the ms=48000 version doesn't perform a compare after subtracting 1 from r0, whereas the other two do, so I understand why the loop takes 4 clock cycles in this case, and not the case of ms=ms*38401/4, thus giving a frequency of 125Hz.

The only real difference I see between the ms=ms*38400/4 and ms=ms*38401/4 assembled versions are the additon of three lines for the ms=ms*38401/4 version outside the loop but in the subroutine (please confirm). these lines are:

0x0800012A 0000 MOVS r0,r0 
0x0800012C 9601 STR r6,[sp,#0x04] 
0x0800012E 0000 MOVS r0,r0

As far as I can tell these lines pointlessly move register r0 to r0, save r6 to RAM, and pointlessly move r0 to r0. They are an addition to the ms=ms*3840

1

/4 version so should slow that one down (only by a tiny amount). In both the ms=48000 and ms=ms*38400/4 cases r0 starts with the value 0x000BB80, so should run the loop 48000 times (for ms=ms*38401/4 r0 starts with 0x000BB81).

For ms=48000:

47: ms = 48000; 
48: while (ms) { 
0x08000114 4801 LDR r0,[pc,#4] ; @0x0800011C 
49: ms--; 
50: } 
0x08000116 1E40 SUBS r0,r0,#1 
48: while (ms) { 
49: ms--; 
50: } 
0x08000118 D1FD BNE 0x08000116 
51: } 
0x0800011A 4770 BX lr 
0x0800011C BB80 DCW 0xBB80 ; ? Undefined

For ms=ms*38400/4:

47: ms = ms * 38400 / 4; 
0x08000114 214B MOVS r1,#0x4B 
0x08000116 0249 LSLS r1,r1,#9 
0x08000118 4348 MULS r0,r1,r0 
0x0800011A 17C1 ASRS r1,r0,#31 
0x0800011C 0F89 LSRS r1,r1,#30 
0x0800011E 1808 ADDS r0,r1,r0 
0x08000120 1080 ASRS r0,r0,#2 
48: while (ms) { 
0x08000122 E000 B 0x08000126 
49: ms--; 
50: } 
0x08000124 1E40 SUBS r0,r0,#1 
48: while (ms) { 
49: ms--; 
50: } 
0x08000126 2800 CMP r0,#0x00 
0x08000128 D1FC BNE 0x08000124 
51: } 
0x0800012A 4770 BX lr

For ms=ms*38401/4:

47: ms = ms * 38401 / 4; 
0x08000114 4905 LDR r1,[pc,#20] ; @0x0800012C 
0x08000116 4348 MULS r0,r1,r0 
0x08000118 17C1 ASRS r1,r0,#31 
0x0800011A 0F89 LSRS r1,r1,#30 
0x0800011C 1808 ADDS r0,r1,r0 
0x0800011E 1080 ASRS r0,r0,#2 
48: while (ms) { 
0x08000120 E000 B 0x08000124 
49: ms--; 
50: } 
0x08000122 1E40 SUBS r0,r0,#1 
48: while (ms) { 
49: ms--; 
50: } 
0x08000124 2800 CMP r0,#0x00 
0x08000126 D1FC BNE 0x08000122 
51: } 
0x08000128 4770 BX lr 
0x0800012A 0000 MOVS r0,r0 
0x0800012C 9601 STR r6,[sp,#0x04] 
0x0800012E 0000 MOVS r0,r0

So I understand why the ms=48000 version gives 125Hz and why the ms=ms*38401/4 version gives 100Hz, but I am confused as to why the ms=ms*38400/4 version rusn slow at 33Hz.

chrispadbury · ‎2012-08-30

Posted on August 30, 2012 at 13:48

Attached are some screen grabs for each of the three cases if that helps

________________

Attachments :

screen_grabs.zip : https://st--c.eu10.content.force.com/sfc/dist/version/download/?oid=00Db0000000YtG6&ids=0680X000006HznL&d=%2Fa%2F0X0000000bQD%2FcQ33tgdvmyHOH7rZHEI_PQzGQXh6YlUAlOC5hmHraGU&asPdf=false

chrispadbury · ‎2012-08-30

Posted on August 30, 2012 at 16:16

Adding a NOP in the loopmakes sense/corrects with the ms=ms*38400/4 version (33Hz still!):

47: ms = ms * 38400 / 4; 
0x08000114 214B MOVS r1,#0x4B 
0x08000116 0249 LSLS r1,r1,#9 
0x08000118 4348 MULS r0,r1,r0 
0x0800011A 17C1 ASRS r1,r0,#31 
0x0800011C 0F89 LSRS r1,r1,#30 
0x0800011E 1808 ADDS r0,r1,r0 
0x08000120 1080 ASRS r0,r0,#2 
48: while (ms) { 
0x08000122 E001 B 0x08000128 
49: ms--; 
0x08000124 1E40 SUBS r0,r0,#1 
50: __nop(); 
51: } 
0x08000126 BF00 NOP 
48: while (ms) { 
49: ms--; 
50: __nop(); 
51: } 
0x08000128 2800 CMP r0,#0x00 
0x0800012A D1FB BNE 0x08000124 
52: } 
0x0800012C 4770 BX lr 
0x0800012E 0000 MOVS r0,r0

But makes the ms=ms*38401/4 version run at a strange frequency of 43Hz:

47: ms = ms * 38401 / 4; 
0x08000114 4905 LDR r1,[pc,#20] ; @0x0800012C 
0x08000116 4348 MULS r0,r1,r0 
0x08000118 17C1 ASRS r1,r0,#31 
0x0800011A 0F89 LSRS r1,r1,#30 
0x0800011C 1808 ADDS r0,r1,r0 
0x0800011E 1080 ASRS r0,r0,#2 
48: while (ms) { 
0x08000120 E001 B 0x08000126 
49: ms--; 
0x08000122 1E40 SUBS r0,r0,#1 
50: __nop(); 
51: } 
0x08000124 BF00 NOP 
48: while (ms) { 
49: ms--; 
50: __nop(); 
51: } 
0x08000126 2800 CMP r0,#0x00 
0x08000128 D1FB BNE 0x08000122 
52: } 
0x0800012A 4770 BX lr 
0x0800012C 9601 STR r6,[sp,#0x04] 
0x0800012E 0000 MOVS r0,r0

So the ms=ms*38400/4 version is running at a rate of 6 clock cycles per loop, and the ms=ms*38401/4 version is running at a rate of 7 clock cycles per loop.

chrispadbury · ‎2012-08-30

Posted on August 30, 2012 at 16:52

Ok I think I am getting somewhere and it might be to do with the flash alignement that you suggested I suspect:

If I insert a __nop(); line into the subroutine before ms=... then it changes the frequency of the ms=ms*38401/4 version to 83.33Hz! The ms=ms*38400/4 version now runs at 100Hz.

What made me check this was the address of the loop back was originally 0x08000124 for the ms=ms*38400/4 version (erroneously running at 83.33Hz) whilst it was 0x08000122 for the ms=ms*38401/4 version.

With the NOP inserted the address of the loop back is now 0x08000126 for the ms=ms*38400/4 version (correctly running at 100Hz) whilst it was 0x08000124 for the ms=ms*38401/4 version (erroneously running at 83.33Hz).

''What's so special about the address 0x08000124?'' I wondered

Adding an extra NOP makes the loop back addresses 0x08000128 for ms=ms*38400/4 and 0x08000126 for ms=ms*38401/4 respectively. This makes the ms=ms*38400/4 (with loopback address of 0x08000128) run slow at 83.33Hz and the ms=ms*38401/4 version run correctly at 100Hz.

Hypothesis:

having a loop back address equal to an even multiple of 2 results in an extra clock cycle.

Does this make sense on the basis of a 32bit code space?

Tesla DeLorean · ‎2012-08-30

Posted on August 30, 2012 at 16:58

Quick observations:

The 0x9601 ''opcode'' is the literal constant 38401

The 0x0000 preceding is 32-bit alignment

The 38401 example performs an additional iteration for 5 ms

The branch target on the 38400 example has 32-bit alignment

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Tesla DeLorean · ‎2012-08-30

Posted on August 30, 2012 at 17:07

Crossed on the wire there.

I don't know enough about the gate level design of the M0, but I suspect the prefetch might be 32-bit. Flash lines tend to be of the 64 or 128-bit width, but this is handled outside the core, perhaps with a preload buffer/cache and a barrel shift.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

chrispadbury · ‎2012-08-30

Posted on August 30, 2012 at 17:27

Thanks Clive1. I have used a similar sub-routine with the STm32F4discovery (M4) without coming across this issue. Tricky for me to have to always ensure that the loop back address is 32bit aligned.

This is interesting and something I am keen to know in more depth to avoid issues. Were would you go to read up on this?

Out of interest, when you need to add a delay of some ms or us how do you acheive it?

Tesla DeLorean · ‎2012-08-30

Posted on August 30, 2012 at 18:52

I'd prefer to compare deltas on free running counters/timers. These are immune to code placement, and interrupts, etc.

On the F1/F2/F4 you have the core cycle counter in trace unit, DWT_CYCCNT, should easily handle a minute delay with perhaps 5-15 cycle accuracy. The M0 doesn't appear to have this.

For fractional seconds, look at the RTC prescaler's counter.

Commit one timer, not being used or connected to pins, set it up with desired granularity and do a delta delay across CNT values. 16-bit is a bit lame, could clock at 1 MHz, 1000 count 1ms +/- 1us

Want to control code more, use the assembler, not the compiler.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..