cancel
Showing results for 
Search instead for 
Did you mean: 

Does anybody know how to mathematically calculate blocking delays?

arnold_w
Senior
Posted on June 10, 2016 at 13:01

I found a blocking delay function on the Internet (http://thehackerworkshop.com/?p=1209 ) and I made some small changes to it:

void
delay(Delay_t delay_)
{
unsigned 
int
loopsPerMicrosecond = (unsigned 
int
)delay_;
for
(; 0 < delay_; delay_ --)
{
asm 
volatile
(
''mov r3, %[loopsPerMicrosecond] \n\t''
//load the initial loop counter
''loop4: \n\t''
''subs r3, #1 \n\t''
''bne loop4 \n\t''
: 
//empty output list
: [loopsPerMicrosecond] 
''r''
(loopsPerMicrosecond) 
//input to the asm routine
: 
''r3''
, 
''cc''
//clobber list
);
}
}

Here's what the function looks like in the dump file (I don't know why it appears twice in the dump file):

void
delay(Delay_t delay_)
{
unsigned 
int
loopsPerMicrosecond = (unsigned 
int
)delay_;
801a7e8: 4602 mov r2, r0
for
(; 0 < delay_; delay_ --)
801a7ea: b128 cbz r0, 801a7f8 <loop4+0xa>
{
asm 
volatile
801a7ec: 4613 mov r3, r2
0801a7ee <loop4>:
801a7ee: 3b01 subs r3, #1
801a7f0: d1fd bne.n 801a7ee <loop4>
void
delay(Delay_t delay_)
{
unsigned 
int
loopsPerMicrosecond = (unsigned 
int
)delay_;
for
(; 0 < delay_; delay_ --)
801a7f2: 3801 subs r0, #1
801a7f4: b2c0 uxtb r0, r0
801a7f6: e7f8 b.n 801a7ea <delay+0x2>
: 
//empty output list
: [loopsPerMicrosecond] 
''r''
(loopsPerMicrosecond) 
//input to the asm routine
: 
''r3''
, 
''cc''
//clobber list
);
}
}

I would like to find out what actual delay this function creates at different frequencies. Since I'm not used to assembler programmer language, I used an oscilloscope to measure some delays, but it's not accurate because setting the test pin low/high also takes a few instructions:

typedef
enum
{
DELAY_18_us_1_MHz_Clock = 1,
DELAY_32_us_1_MHz_Clock = 2,
DELAY_52_us_1_MHz_Clock = 3,
DELAY_78_us_1_MHz_Clock = 4,
DELAY_111_us_1_MHz_Clock = 5,
DELAY_150_us_1_MHz_Clock = 6,
DELAY_1_point_18_us_16_MHz_Clock = 1,
DELAY_12_us_16_MHz_Clock = 7,
DELAY_107_us_16_MHz_Clock = 23,
DELAY_202_us_16_MHz_Clock = 32,
DELAY_298_us_16_MHz_Clock = 39,
DELAY_312_us_16_MHz_Clock = 40,
DELAY_396_us_16_MHz_Clock = 45,
DELAY_412_us_16_MHz_Clock = 46,
DELAY_502_us_16_MHz_Clock = 51,
DELAY_604_us_16_MHz_Clock = 56
} Delay_t;

Does anybody know a mathematical expressing for calculating the delay at various frequencies (1, 2, 4, 8, 16, ... etc MHz)?
6 REPLIES 6
Posted on June 10, 2016 at 13:19

> I found a blocking delay function on the Internet ( http://thehackerworkshop.com/?p=1209 ) and I made some small changes to it:

Why? You made a ''quadratic'' version, what would be the utility of that?

> I would like to find out what actual delay this function creates at different frequencies. Since I'm not used to assembler programmer language,

> I used an oscilloscope to measure some delays, but it's not accurate because setting the test pin low/high also takes a few instructions:

Toggle the pin without calling the delay, that will give you the overhead of pin toggling.

There are many sources of error for this anyway and I am not going to list them here, do your homework yourself. Loop delays are intended to provide a ''at least xxx long'' delays, so the actual delay length is not that important, unless the error is not excessive.

JW

arnold_w
Senior
Posted on June 10, 2016 at 13:36

I don't know what quadratic means. It used to be a millisecond delay function, but sometimes I need (at least) 10 microseconds delay so therefore I trimmed it down. I intend to use it only when there are minimum time requirements, but I can't waste time (such as using 1 ms delay when the minimum is 10 microsecods) because some of these delays are used in loops with many iterations.

Posted on June 10, 2016 at 14:44

The listing file is just confusing the code vs source lines, it is one function, the outer loop in C the inner loop in assembler.

As single loop has the form y = mx + c, the double is of the form y = (ax + b)(cz + d)

For this kind of time span a single loop would suffice, I'd prefer to use one with DWT_CYCCNT, finer granularity and handles interrupts better.

Tips, buy me a coffee, or three.. PayPal Venmo Up vote any posts that you find helpful, it shows what's working..
arnold_w
Senior
Posted on June 13, 2016 at 15:33

How about this, is this better?

#define CYCLES_ELAPSED(_startTime_) ((uint32_t)(((uint32_t)*DWT_CYCCNT) - (uint32_t)_startTime_))
void
initialiseDelay(
void
)
{
*DEMCR = *DEMCR | 0x01000000; 
// Enable the use DWT
*DWT_CYCCNT = 0; 
// Reset cycle counter
*DWT_CONTROL = *DWT_CONTROL | 1 ; 
// Enable cycle counter
}
void
delayMicroseconds(uint32_t delay)
{
volatile
uint32_t startTime;
uint32_t numNeededClockCycles;
startTime = (*DWT_CYCCNT);
numNeededClockCycles = delay * (SystemCoreClock/1000000) - 19; 
// Compensate for overhead
while
(CYCLES_ELAPSED(startTime) < numNeededClockCycles) {}
}

arnold_w
Senior
Posted on June 13, 2016 at 15:34

I chose to compensate by 19 clockcycles because in the dump file the function occupies 19 clockcycles:

void
delayMicroseconds(uint32_t delay)
{
801a942: b082 sub sp, #8
volatile
uint32_t startTime;
uint32_t numNeededClockCycles;
startTime = (*DWT_CYCCNT);
801a944: 6813 ldr r3, [r2, #0]
801a946: 9301 str r3, [sp, #4]
numNeededClockCycles = delay * (SystemCoreClock/1000000) - 19; 
// Compensate for overhead
801a948: 4b08 ldr r3, [pc, #32] ; (801a96c <delayMicroseconds+0x30>)
801a94a: 681b ldr r3, [r3, #0]
801a94c: fbb3 f3f1 udiv r3, r3, r1
801a950: 4358 muls r0, r3
801a952: 3813 subs r0, #19
while
(CYCLES_ELAPSED(startTime) < numNeededClockCycles) {}
801a954: 6813 ldr r3, [r2, #0]
801a956: 9901 ldr r1, [sp, #4]
801a958: 1a5b subs r3, r3, r1
801a95a: 4298 cmp r0, r3
801a95c: d8fa bhi.n 801a954 <delayMicroseconds+0x18>
}
801a95e: b002 add sp, #8
801a960: 4770 bx lr
801a962: bf00 nop
801a964: 20000010 andcs r0, r0, r0, lsl r0
801a968: 000f4240 andeq r4, pc, r0, asr #4
801a96c: 20000008 andcs r0, r0, r8

Posted on June 13, 2016 at 17:13

You could confirm the delay timing with a toggling GPIO and a scope, or reading the entry/exit values of DWT_CYCNT. Interrupt loading my cloud the results.

I think there are more efficient ways to implement the loop, and ways to do the math that don't lose precision.

Tips, buy me a coffee, or three.. PayPal Venmo Up vote any posts that you find helpful, it shows what's working..