STM32F103RB: Timer OC dropping overflow interrupts

dhaselwood · ‎2013-02-03

Posted on February 04, 2013 at 04:05

It appears the TIM4 OC interrupts on CH2 results in occasionally missing an Update (overflow) interrupt.

The situation is a 1 PPS GPS pulse triggers the TIM4 CH1 input capture. The 16 bit counter time is extended by adding 0x10000 to a 32b time count upon each overflow interrupt and loading the IC register into the lower 16b of the extended time count upon an IC. The case where both flags are on when the isr executes is handled by adding an overflow count if the IC less than 32767, and skipping otherwise. This will run for days without missing.

The CH2 output compare was added with the goal of generating 64 interrupts that are accurately synchronized with the GPS 1 PPS. The 16b register requires adding increments to the OC register upon each OC interrupt to span the 1/64th sec. The increment is computed from the number ticks between the 1 PPS input captures. When the CH2 interrupt flag is enabled the time between 1 PPS interrupts occasionally is 65536 too small indicating that an overflow count was missed.

The following leads me to the notion that bus timing is involved in the following experiment. First, the following code causes many cases of missing overflow counts-

TIM4_CCR2 += ticks /1280;

(Where 'ticks' would be something like 64000253).

However, the following runs for long periods (e.g. a hour) of time before an error-

x = ticks / 1280;

y += x;

TIM4_CCR2 = y;

Another, possibly related curiosity, is that for the input capture of 1 PPS times (and the OC not running nor involved) there are four situations for for the status register when the isr is executed--

01 overflow only

10 input capture only

11 both on (requiring a test to see which was first)

00 Which appears to be an overflow and IC capture coinciding, i.e. the correct result is if this is handled as a 11 case.

I observed the '00' case 11 times over approximately a 12 hour period.

My hunch is that the problem when the OC was added has a similar cause as the '00' input capture case.

BTW, I tried running the bus at 1/2 the speed that the problem remained. Also I do a readback of the SR before exiting the isr to avoid tail-chaining (and tail-chaining would result in extra, not missing, counts).

Any ideas as to what is causing the OC to disrupt the overflow and IC interrupts?

Tesla DeLorean · ‎2013-02-03

Posted on February 04, 2013 at 06:44

I'd really want to get to the bottom of the 00 case, because it's indicative of reentrant behaviour.

I might do a couple of things. The first would be to time stamp my interrupt entry, and perhaps exit, times from a 32-bit time base (DWT_CYCCNT jumps out due to it's very good granularity, and close to 1 minute wrap time). This would allow you to place the 00 with respect to the prior service, and perhaps let you watch the 1PPS sawtooth and slew.

The second would be place a bunch of delay at the back end of the service routine before leaving, perhaps a couple of hardware divides to allow the write buffer(s) to retire before letting the NVIC decide what happens next.

Another thing I might do is log the CNT value at these events.

Without knowing these things I'm fishing a bit. Getting some more symptoms might get a better diagnosis.

What kind of clock source are you driving the STM32 with?

I might also be temped to ignore the overflow completely, and just look at the capture with respect to the aforementioned timebase, as it's synchronous to the timer with a stupidly long wrap time compared to the input periodicity.

You are sure the 1PPS isn't blanking?

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

dhaselwood · ‎2013-02-04

Posted on February 04, 2013 at 18:10

Tesla DeLorean · ‎2013-02-04

Posted on February 04, 2013 at 18:38

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0337g/BABJFFGJ.html

https://my.st.com/public/STe2ecommunities/mcu/Lists/cortex_mx_stm32/DispForm.aspx?ID=11943&RootFolder=/public/STe2ecommunities/mcu/Lists/cortex_mx_stm32/Duration%20of%20FLOAT%20operations

http://forums.arm.com/index.php?/topic/13949-cycle-count-in-cortex-m3/

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

dhaselwood · ‎2013-02-04

Posted on February 05, 2013 at 03:59

dhaselwood · ‎2013-02-07

Posted on February 07, 2013 at 17:45

Not using big-banding to reset the interrupt flag may have solved the problem. I was using big-banding to write a zero to the flag bit. For example, changing to--

TIM4_SR = ~0x1;

appears to eliminate the dropped OC overflows. I did not see a single case in an 11 hour test last night.

Here is the test that shows the problem--

OC register is set to a fixed value

IC is driven from 1 PPS GPS

OC, IC, and overflow interrupts are all enabled.

No other interrupts on the system. USART2 used for output, using a non-interrupt polling routine.

In the isr counters are incremented for each IC interrupt and OC interrupt.

Upon each IC interrupt the counters are saved (along with a flag). The mainline routine displays the difference between the latest and previous counts. With the clock running at a measured 64,000,175 per sec one would expect about 976 interrupts (clock/65536) for the IC and for the overflow. When the OC register was 0x37-0x3f the OC count would sometimes be ''small,'' ranging from 0 to about 600, sometimes intermittently. 0x3b seemed to the ''sweet spot'' and it would be zero continuously. The OC interrupt enable bit was monitored and shown to be on. The OC flag never got set. Logically, on each cycle of CNT the OC flag should be set, yet in the worst case it never was seen by the isr as being on.

Experimenting with the length of the code after resetting the overflow and before the 'return' from the isr I was able to move the ''bad'' value range for the OC a little. That led to removing the bit-banding for the overflow and IC flag resets to simply writing a word with all bits on except of the flag bit. The problem appeared to go away.

Clearly, resetting an interrupt flag with bit-banding versus writing a word to the hardware address is different when it comes to what is going on at the gate-level logic timing.

Leading up to the discovery was a test situation where the OC register was being changed upon each interrupt. When it happened to hit 0x3b the interrupts stopped, but not forever. After a period of time, ranging from maybe 10 secs to a number of minutes it would start up again. It appears that the timing of 1 PPS and/or resetting of the OC/IC flags was involved and the OC stall might ''unlock.''

While empirical evidence indicates that the change from bit-banding solves the problem it would be much more satisfying to have an explanation based on how the hardware operates.

BTW, the IC/overflow '00' case only would happen with the OC interrupt was enabled (and of course bit-banding reset was being used).

Tesla DeLorean · ‎2013-02-07

Posted on February 07, 2013 at 18:24

Isn't the explanation as simple as the RMW being done by the BB operation writing back zero bits to subsequently asserted (post R phase) interrupt bits in the SR?

The atomic nature of the RMW is limited to the processor interaction, not what the peripheral is doing in the some 8 cycle window.

The SR is designed to be cleared by a singular write with the inverse mask with the bit you want to clear as zero.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

dhaselwood · ‎2013-02-07

Posted on February 07, 2013 at 19:24

Tesla DeLorean · ‎2013-02-07

Posted on February 07, 2013 at 20:17

Excellent point about the BB being an processor function. Having the flag come on between the R of the RMW makes sense. If that is true, then it means that BB cannot be used to set/reset a bit with the hardware registers where the contents of the register might change during at anytime.

What is still a bit bothersome is that if it is a case of a TIM4 flag coming on during a RMW, thus writing a zero into a flag that just came on after the R, the timing would have to be rather precise, i.e. down to the a couple of 15.6 ns ticks. Given the jitter on the GPS 1 PPS it seems a bit of stretch to see the OC ''locked'' for a number of minutes. It would, however, account for the variable ''locked'' time I was seeing, where there is a period with particularly stable 1 PPS, e.g. sometimes varying only 3 or 4 ticks between seconds. Then times when the jitter was larger as the S/N estimate in the GPS Kalman filtering changes.

Thanks again. I think you may have identified the cause of the problem.

I think the window is considerably more than a cycle. The tail end of the W phase here is basically a synchronous AND with the current content of the register, and the new interrupt OR'd on. Each APB access you can figure at about 4 bus cycles, perhaps longer in the write buffer. I'm sure someone with gate level design access could quantify the hazard, but I'd guess it's closer to 111.1ns than 13.8ns (72MHz).

Just writing the mask would have no window.

So yes I'd definitely want to steer clear of BB Write on self/special clearing (ie non memory) registers. A BB Read to extract a single bit state should be Ok.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

waclawek.jan · ‎2013-02-07

Posted on February 08, 2013 at 08:18

Is it just me, or do all of Don's post display only after ''Show quoted messages'' is clicked (which requires javascript being on), for others, too?

JW