STM32F103VET6: How before Interrupt disable flag takes effect?

dhaselwood · ‎2011-08-16

Posted on August 16, 2011 at 18:38

I isolated a ''glitch'' in a Timer 2 input capture routine--it appears that the actual disabling of the interrupt does not take place until a few cycles following the instruction that disables the interrupts.

Can you point me to a document & section that details the timing?

Below is a snip of the code where the issue was seen--

The intention is to disable interrupts, copy the extended input capture time, a counter incremented by each interrupt, and a counter associated & incremented by RTC interrupts. After the copying is complete, the Tim2 input capture interrupt (and event udpate for counting overflows) are then re-enabled.

If an interrupt is serviced between the first and second copy instructions, the mainline sees the interrupt count increased, therefore signifying a new input capture, however the first value copied is the old input capture time. The conclusion must be reached that the interrupts were not disabled.

Removing the re-enable instruction proves that the disable instruction does indeed stop the interrupts, however it does not prove when the interrupt disabling took place. Placing dummy coded following the instruction that disables the interrupts eliminates the ''glitch'', thus indicating that the interrupt disabling does not take place immediately, but only after a number of cycles has transpired.

In this case the processor is running at 48 MHz and APB1 & APB2 at 12 MHz. One line of thinking is that the TIM2 requires one APB1 cycle (which would be four of the processor cycles).

My finding is empirical and I would like to find some documentation that explains it.

volatile int Tim2_dummy;

struct TIMCAPTRET32 Tim2_inputcapture_ui(void)

{

struct TIMCAPTRET32 strY; // 32b input capture time and flag counter

TIM2_DIER &= ~(TIM_DIER_CC2IE | TIM_DIER_UIE); // Disable CH2 capture interrupt and counter overflow (p 315)

// Tim2_dummy += 1;

__asm__(''NOP''); // Wait for event

strY.ic = strTim2m.ui[0]; // Get 32b input capture time

strY.flg = usTim2ch2_Flag; // Get flag counter

strY.cnt = uiRTCsystemcounterTim2IC; // Get RTC_CNT tick counter saved at last input capture

TIM2_DIER |= (TIM_DIER_CC2IE | TIM_DIER_UIE); // Enable CH2 capture interrupt and counter overflow (p 315)

return strY;

}

Tesla DeLorean · ‎2011-08-16

Posted on August 16, 2011 at 19:04

The processor is pipelined, it also contains a ''write buffer''.

When you write to a peripheral bus that's running slower, the write is posted and the processor continues to execute. NOPs will just stuff the pipeline, reading back the peripheral register will be more effective.

Things will not block until you want to write some more data to the buffer, or you force synchronization by reading.

One specific manifestation of the problem is when you clear a pending IRQ on a peripheral immediately before returning, the interrupt state will not clear before the tail-chaining decision is made and the service routine will re-enter.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

dhaselwood · ‎2011-08-16

Posted on August 16, 2011 at 20:03

Thanks Clive1. I'll follow your lead and dig into this some more.

Does a write followed by readback of the register than stall the execution until readback completes?

My experimenting since my original post shows that 4 nop's works, 3 fails. Looking at the assembly code listing there are the 4 nop's plus an instruction that loads the address of the copy, followed by an instruction that loads the value to be copied into a register. These are 16 bit instructions, so I presume they execute in one cycle. This makes 6 cycles following the 'str' to disable the interrupt enable flags.

Tesla DeLorean · ‎2011-08-16

Posted on August 16, 2011 at 21:47

The pipeline will give the appearance of a one cycle throughput, the latency is higher.

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0337e/CACCIFED.html

All memory accesses (load/store) occur in program order otherwise serious synchronization issues occur. This doesn't preclude the CPU from fetching from tightly coupled memory.

Assume writes are deferred, and will propagate across the bus(es), and synchronize to slower clocks as they get to their final destination.

If you store and then load back a peripheral register, the operations will occur back-to-back, execution will stall until the read completes.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

dhaselwood · ‎2011-08-16

Posted on August 16, 2011 at 22:04

Clive1,

Thanks. Very helpful.

Your mention of tail-chaining also led to an explanation of a vexing issue I had sometime back with SPI1. My routine was simple and skipped testing the interrupt flag since it was the only interrupt enabled. I would get one ''flagless'' interrupt with each correct interrupt. I scrapped the code (recovering it from the svn would take some work), but clearly what was happening was the interrupt flag reset had not completed when the return execution began. The result was another re-entry into the interrupt routine with no flag, as the reset had already completed.

Thanks again.

Tesla DeLorean · ‎2011-08-16

Posted on August 16, 2011 at 23:36

One of the forum members was having that kind of issue, it was pretty easy to replicate.

[DEAD LINK /public/STe2ecommunities/mcu/Lists/cortex_mx_stm32/Flat.aspx?RootFolder=/public/STe2ecommunities/mcu/Lists/cortex_mx_stm32/Timer update event interrupt retriggering after exit&FolderCTID=0x01200200770978C69A1141439FE559EB459D7580009C4E14902C3CDE46A77F0FFD06506F5B&currentviews=377]Interrupt entering twice

It's one of those hardware hazards you have to watch for. It makes the CPU design easier if you're not enforcing all kinds of interlocks and bubbles in the pipeline hardware, especially ones from external buses to the processor. Be grateful you aren't using the Itanium, you were expected to start register loads many cycles before you planned on actually using the register. There were software tools to detect hazards and do instruction reordering and NOP stuffing.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

dhaselwood · ‎2011-08-20

Posted on August 20, 2011 at 18:35

Clive1,

Would a double write to the register also avoid the problem (of interrupts not being disabled when the instruction following the 'str' to the hardware register is executed)? From my testing this appears to be the case.

The sequence would be: read the timer register to a temp value (which ends up in R1), do the bit clear, then store the value back to the timer register, followed by a second store. The code generated is exactly the same as the code for readback except that the 'ldr' for readback becomes a 'str'. However, the effect on what takes place might be different. With the first write the write buffer is loaded and the next instruction cannot load the write buffer until it the first write completes. With the readback it is not quite as clear what stalls the execution until the write has completed.

Studying the ARM docs to get a better understanding of it all, a friend and I figured a 'DSB sy' (or possibly 'DMB sy') following the instruction that disables the interrupt flags would work, but testing proved this not to be the case.

Clearly, we have solutions to the problem, so the interest now is getting a solid understanding how it all behaves.

Tesla DeLorean · ‎2011-08-21

Posted on August 21, 2011 at 15:05

With the back-to-back writes, the second write will stall the pipeline until the content of the first is delivered to, and accepted by, the memory subsystem. Depending on how the subsystem is architected, it may take several more cycles to percolate to it's final destination. These cycles won't stall the pipeline any more, but will impact latency.

Reads on the other hand can't be deferred. These will stall the execution pipeline until the data is delivered by the memory subsystem.

As there isn't any caching of the ram/peripherals, doing repetitive reads should expose the latency of the operation, and provide a good estimate for what a write will cost. Doing repetitive writes should expose the throughput of the write buffering to the memory subsystem. The trace unit in the core has a cycle counter that is useful to benchmark such operations.

As I understand the flash controller on the STM32F1, the first read of a flash-line (64-bits?) is charged the full cost of reading flash (30-40ns?), subsequent reads of the same line can be delivered immediately, and meanwhile the next line is prefetched, hopefully before you need it, or branch elsewhere.

There is a big issue with writing to flash on the STM32 while executing code from flash. The design permits this, but reads from flash (instructions/data) will stall the execution pipeline very significantly (see write/erase timings), to the point where it can't service peripherals, and things like the USART receive buffer overflow.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

dhaselwood · ‎2011-08-21

Posted on August 21, 2011 at 18:18

epassoni950 · ‎2011-08-23

Posted on August 23, 2011 at 11:42

For your problem, I think that using an DSB instruction is better. The definition of this instruction is exactly what you need :

Instructions that come after the DSB, in program order, do not execute until the DSB instruction completes. The DSB instruction completes when all explicit memory accesses before it complete.

I think it's better that using a second write because the second write need a new acces to peripheral register.

Eric