2012-08-29 08:11 AM
I am trying to have a simple millisecond delay function and experiencing some very strange results. I am using Keil uVision V4.00.0. Here is my code:
#include ''stm32f0_discovery.h''
//prototypes
void Delay_ms(long ms);
//configure clocks
void RCC_Configuration(void)
{
/* --------------------------- System Clocks Configuration -----------------*/
/* GPIOC clock enable */
RCC_AHBPeriphClockCmd(RCC_AHBPeriph_GPIOC, ENABLE);
}
//configure GPIO
void GPIO_Configuration(void)
{
GPIO_InitTypeDef GPIO_InitStructure;
/*-------------------------- GPIO Configuration ----------------------------*/
GPIO_InitStructure.GPIO_Pin = GPIO_Pin_6 | GPIO_Pin_7 | GPIO_Pin_8 | GPIO_Pin_9;
GPIO_InitStructure.GPIO_Mode = GPIO_Mode_OUT;
GPIO_InitStructure.GPIO_OType = GPIO_OType_PP;
GPIO_InitStructure.GPIO_PuPd = GPIO_PuPd_UP;
GPIO_InitStructure.GPIO_Speed = GPIO_Speed_50MHz;
GPIO_Init(GPIOC, &GPIO_InitStructure);
}
int main(void) {
RCC_Configuration();
GPIO_Configuration();
while(1) {
GPIO_ResetBits(GPIOC, GPIO_Pin_9);
GPIO_SetBits(GPIOC, GPIO_Pin_8);
Delay_ms(5);
GPIO_ResetBits(GPIOC, GPIO_Pin_8);
GPIO_SetBits(GPIOC, GPIO_Pin_9);
Delay_ms(5);
}
}
void Delay_ms(long ms) {
//ms = ms * 48000 / 4;
ms = ms * 38400 / 4;
//ms = 48000;
while (ms) {
ms--;
}
}
This creates an 33Hz squarewave on pin PC8 (measured using scope). I had expected to use a value of 48000 rather than 38400 but that gave an 80Hz squarewave. Iam trying to create 100Hz, which is why I changed it to 38400 expecting to increase the frequency by 25%. I have tried some different lines and get very stange behaviour:
| Multiplier |
Frequency(Hz) |
debug starting Hex |
debug starting Dec|
Clk per loop|
48000
00
5
24000
00
5
12000
00
5
20000
00
5
38400
33
BB80
48000
6
38402
00
BB82
48002
5
38401
00
BB81
48001
5
38399
00
BB7E
47998
5
What's so special about ''ms = ms * 38400 / 4''? I had expected the loop to take 4 clock cycles (hence starting with ''ms = ms * 48000 / 4''). Next I tried hardwiring the starting values, like using ''ms = 48000''. 48,000 created 125Hz and 60,000 created 100Hz. So that's 4 clocks per cycle. looking at the debug the starting Hex in the first case is BB80! I'm using Level 3 (O3) optimization under Keil. Any ideas why I am seeing such strange behaviour? #loop-timing #keil-uvision #stm32f0512012-08-29 09:34 AM
Software delay loops are notoriously difficult to control. And might also be impacted by flash line alignment, and prefetch mechanisms.
You'd need to review the assembler to see what the compiler has done with the code, or if it has unwound the loop. Having an index variable as ''volatile'' will also impact what the optimizer will do. I haven't looked to see if the F0 has a core cycle counter or not. I'd use TIM PWM to generate accurate and consistent pulses.2012-08-30 04:28 AM
Thanks clive1. I set up the squarewave to calibrate my Delay_ms sub-routine as I would like to use it for other delays in F0 codes.
I've reviewed the assembler in Keil debug and the relevant sections are below. I see that the ms=48000 version doesn't perform a compare after subtracting 1 from r0, whereas the other two do, so I understand why the loop takes 4 clock cycles in this case, and not the case of ms=ms*38401/4, thus giving a frequency of 125Hz.
The only real difference I see between the ms=ms*38400/4 and ms=ms*38401/4 assembled versions are the additon of three lines for the ms=ms*38401/4 version outside the loop but in the subroutine (please confirm). these lines are:0x0800012A 0000 MOVS r0,r0
0x0800012C 9601 STR r6,[sp,#0x04]
0x0800012E 0000 MOVS r0,r0
As far as I can tell these lines pointlessly move register r0 to r0, save r6 to RAM, and pointlessly move r0 to r0. They are an addition to the ms=ms*3840
1
/4 version so should slow that one down (only by a tiny amount). In both the ms=48000 and ms=ms*38400/4 cases r0 starts with the value 0x000BB80, so should run the loop 48000 times (for ms=ms*38401/4 r0 starts with 0x000BB81).
For ms=48000:
47: ms = 48000;
48: while (ms) {
0x08000114 4801 LDR r0,[pc,#4] ; @0x0800011C
49: ms--;
50: }
0x08000116 1E40 SUBS r0,r0,#1
48: while (ms) {
49: ms--;
50: }
0x08000118 D1FD BNE 0x08000116
51: }
0x0800011A 4770 BX lr
0x0800011C BB80 DCW 0xBB80 ; ? Undefined
For ms=ms*38400/4:
47: ms = ms * 38400 / 4;
0x08000114 214B MOVS r1,#0x4B
0x08000116 0249 LSLS r1,r1,#9
0x08000118 4348 MULS r0,r1,r0
0x0800011A 17C1 ASRS r1,r0,#31
0x0800011C 0F89 LSRS r1,r1,#30
0x0800011E 1808 ADDS r0,r1,r0
0x08000120 1080 ASRS r0,r0,#2
48: while (ms) {
0x08000122 E000 B 0x08000126
49: ms--;
50: }
0x08000124 1E40 SUBS r0,r0,#1
48: while (ms) {
49: ms--;
50: }
0x08000126 2800 CMP r0,#0x00
0x08000128 D1FC BNE 0x08000124
51: }
0x0800012A 4770 BX lr
For ms=ms*38401/4:
47: ms = ms * 38401 / 4;
0x08000114 4905 LDR r1,[pc,#20] ; @0x0800012C
0x08000116 4348 MULS r0,r1,r0
0x08000118 17C1 ASRS r1,r0,#31
0x0800011A 0F89 LSRS r1,r1,#30
0x0800011C 1808 ADDS r0,r1,r0
0x0800011E 1080 ASRS r0,r0,#2
48: while (ms) {
0x08000120 E000 B 0x08000124
49: ms--;
50: }
0x08000122 1E40 SUBS r0,r0,#1
48: while (ms) {
49: ms--;
50: }
0x08000124 2800 CMP r0,#0x00
0x08000126 D1FC BNE 0x08000122
51: }
0x08000128 4770 BX lr
0x0800012A 0000 MOVS r0,r0
0x0800012C 9601 STR r6,[sp,#0x04]
0x0800012E 0000 MOVS r0,r0
So I understand why the ms=48000 version gives 125Hz and why the ms=ms*38401/4 version gives 100Hz, but I am confused as to why the ms=ms*38400/4 version rusn slow at 33Hz.
2012-08-30 04:48 AM
Attached are some screen grabs for each of the three cases if that helps
________________ Attachments : screen_grabs.zip : https://st--c.eu10.content.force.com/sfc/dist/version/download/?oid=00Db0000000YtG6&ids=0680X000006HznL&d=%2Fa%2F0X0000000bQD%2FcQ33tgdvmyHOH7rZHEI_PQzGQXh6YlUAlOC5hmHraGU&asPdf=false2012-08-30 07:16 AM
Adding a NOP in the loopmakes sense/corrects with the ms=ms*38400/4 version (33Hz still!):
47: ms = ms * 38400 / 4;
0x08000114 214B MOVS r1,#0x4B
0x08000116 0249 LSLS r1,r1,#9
0x08000118 4348 MULS r0,r1,r0
0x0800011A 17C1 ASRS r1,r0,#31
0x0800011C 0F89 LSRS r1,r1,#30
0x0800011E 1808 ADDS r0,r1,r0
0x08000120 1080 ASRS r0,r0,#2
48: while (ms) {
0x08000122 E001 B 0x08000128
49: ms--;
0x08000124 1E40 SUBS r0,r0,#1
50: __nop();
51: }
0x08000126 BF00 NOP
48: while (ms) {
49: ms--;
50: __nop();
51: }
0x08000128 2800 CMP r0,#0x00
0x0800012A D1FB BNE 0x08000124
52: }
0x0800012C 4770 BX lr
0x0800012E 0000 MOVS r0,r0
But makes the ms=ms*38401/4 version run at a strange frequency of 43Hz:
47: ms = ms * 38401 / 4;
0x08000114 4905 LDR r1,[pc,#20] ; @0x0800012C
0x08000116 4348 MULS r0,r1,r0
0x08000118 17C1 ASRS r1,r0,#31
0x0800011A 0F89 LSRS r1,r1,#30
0x0800011C 1808 ADDS r0,r1,r0
0x0800011E 1080 ASRS r0,r0,#2
48: while (ms) {
0x08000120 E001 B 0x08000126
49: ms--;
0x08000122 1E40 SUBS r0,r0,#1
50: __nop();
51: }
0x08000124 BF00 NOP
48: while (ms) {
49: ms--;
50: __nop();
51: }
0x08000126 2800 CMP r0,#0x00
0x08000128 D1FB BNE 0x08000122
52: }
0x0800012A 4770 BX lr
0x0800012C 9601 STR r6,[sp,#0x04]
0x0800012E 0000 MOVS r0,r0
So the ms=ms*38400/4 version is running at a rate of 6 clock cycles per loop, and the ms=ms*38401/4 version is running at a rate of 7 clock cycles per loop.
2012-08-30 07:52 AM
Ok I think I am getting somewhere and it might be to do with the flash alignement that you suggested I suspect:
If I insert a __nop(); line into the subroutine before ms=... then it changes the frequency of the ms=ms*38401/4 version to 83.33Hz! The ms=ms*38400/4 version now runs at 100Hz. What made me check this was the address of the loop back was originally 0x08000124 for the ms=ms*38400/4 version (erroneously running at 83.33Hz) whilst it was 0x08000122 for the ms=ms*38401/4 version. With the NOP inserted the address of the loop back is now 0x08000126 for the ms=ms*38400/4 version (correctly running at 100Hz) whilst it was 0x08000124 for the ms=ms*38401/4 version (erroneously running at 83.33Hz). ''What's so special about the address 0x08000124?'' I wondered Adding an extra NOP makes the loop back addresses 0x08000128 for ms=ms*38400/4 and 0x08000126 for ms=ms*38401/4 respectively. This makes the ms=ms*38400/4 (with loopback address of 0x08000128) run slow at 83.33Hz and the ms=ms*38401/4 version run correctly at 100Hz.Hypothesis:
having a loop back address equal to an even multiple of 2 results in an extra clock cycle. Does this make sense on the basis of a 32bit code space?2012-08-30 07:58 AM
Quick observations:
The 0x9601 ''opcode'' is the literal constant 38401 The 0x0000 preceding is 32-bit alignment The 38401 example performs an additional iteration for 5 ms The branch target on the 38400 example has 32-bit alignment2012-08-30 08:07 AM
Crossed on the wire there.
I don't know enough about the gate level design of the M0, but I suspect the prefetch might be 32-bit. Flash lines tend to be of the 64 or 128-bit width, but this is handled outside the core, perhaps with a preload buffer/cache and a barrel shift.2012-08-30 08:27 AM
Thanks Clive1. I have used a similar sub-routine with the STm32F4discovery (M4) without coming across this issue. Tricky for me to have to always ensure that the loop back address is 32bit aligned.
This is interesting and something I am keen to know in more depth to avoid issues. Were would you go to read up on this? Out of interest, when you need to add a delay of some ms or us how do you acheive it?2012-08-30 09:52 AM
I'd prefer to compare deltas on free running counters/timers. These are immune to code placement, and interrupts, etc.
On the F1/F2/F4 you have the core cycle counter in trace unit, DWT_CYCCNT, should easily handle a minute delay with perhaps 5-15 cycle accuracy. The M0 doesn't appear to have this. For fractional seconds, look at the RTC prescaler's counter. Commit one timer, not being used or connected to pins, set it up with desired granularity and do a delta delay across CNT values. 16-bit is a bit lame, could clock at 1 MHz, 1000 count 1ms +/- 1us Want to control code more, use the assembler, not the compiler.