2012-09-14 5:29 AM
I'm working on a project on a STM32F4 CPU, generating signals (VGA) with DMA.
I have a generic timer on CPU clock (no prescaler) on a STM32 triggering interrupts on overflow, to generate a periodic signal with GPIO afterwards.
I need to trigger thr GPIO at a very regular time (basically down to a CPU cycle precision). I've managed to reduce this jitter to +-5 cycles by setting priorities & al, but this jitter exists, depending on what the CPU was doing and the interrupted instruction.
I need to compensate this few cycles jitter. Adding a few cycles more latency isn't a problem as long as I toggle GPIOs always at the same counter cycle.
My idea was to read the current value of the counter after the interrupt, and have an active loop of (FIXED_NUMBER-CNT->VAL) time, ensuring I would exit the loop at precise times.
However, doing a simple loop in C - being a FOR loop, or a while(counter->value < TARGET); doesn't work as it ADDS jitter instead of reducing it.
I ensured with empty, non optimized but not hitting memory loop body (asm(''''))See this example on AVR (more predictable timings) See by example
http://lucidscience.com/pro-vga video generator-7.aspx
(search for ''jitter'')I tried a simple loop in assembly such as (r0 has the number of cycles to wait to compensate counter value)
loop : SUBS r0,#1 ; tried with 2 also BGE loop and, again, jitter is better without it.
To sum it up, I already know how much I should delay. Unfortunately, branches alone don't seem to work (nondeterminisctic pipeline refill ?) and IT conditional expressions don't either because they always take the same number of cycles (sometimes doing nothing).
Would running from RAM instead of flash improve consistency ?
Maybe I'm out of my league here ... any help would appreciated, thanks! (crosspost from stackoverflow as I think I'd have more success here than there, sorry) #interrupts-counter2012-09-14 6:33 AM
Look at the core's cycle counter in the trace unit.
//******************************************************************************
// From http://forums.arm.com/index.php?showtopic=13949
volatile unsigned int *DWT_CYCCNT = (volatile unsigned int *)0xE0001004; //address of the register
volatile unsigned int *DWT_CONTROL = (volatile unsigned int *)0xE0001000; //address of the register
volatile unsigned int *SCB_DEMCR = (volatile unsigned int *)0xE000EDFC; //address of the register
//******************************************************************************
void EnableTiming(void)
{
static int enabled = 0;
if (!enabled)
{
*SCB_DEMCR = *SCB_DEMCR | 0x01000000;
*DWT_CYCCNT = 0; // reset the counter
*DWT_CONTROL = *DWT_CONTROL | 1 ; // enable the counter
enabled = 1;
}
}
//******************************************************************************
void TimingDelay(unsigned int tick)
{
unsigned int start, current;
start = *DWT_CYCCNT;
do
{
current = *DWT_CYCCNT;
} while((current - start) < tick);
}
//******************************************************************************2012-09-14 8:11 AM
Thanks a lot for answering.
I think besides reading the counter at each iteration this seems about the same loop I do. What is the accuracy of this this kind of counter ? Can it be cycle-accurate (CPU cycle) ? ie, if I do the following :for (int i=0;i<100000;i++)
{start = *DWT_CYCCNT2012-09-14 12:56 PM
STM32F4 @ 168 MHz
From FLASH 37 49 37 49 From RAM 39 39 39 392012-09-14 1:26 PM
wow. thanks. you're da man.
2013-01-28 5:49 PM
Hi clive1!
I ran into similar issue in my project recently: I cannot manage to start shifting out the data at the exact moments I want to.I'm shifting out the data using SPI to the screen (PAL signal) and the shifting is triggered by horizontal sync pulses fed to MCY by LM1881.I use STM32F4RG MCU, TIM3 is configured to be reset at raising pulse on TIM3 IN2 line.TIM3 is also configured to generate interrupt when counter equals 100. Tim is configured to run at maximum possible clock. MCU runs at 108 MHz, APB1 runs at 27 MHz, APB2 runs at 54 MHz.In the TIM3 compare interrupt handler, I'm waiting the counter to reach a predefined constant value to have a 4usec pause.The cycle looks like this: U32 volatile *p = &(TIM3->CNT); U32 volatile waittime = 840;.........configuring DMA to serve SPI......do{}while(*p < waittime);SPI2->CR1 |= SPI_CR1_SPE;SPI3->CR1 |= SPI_CR1_SPE;...
......I'm using Keil, and this gives nice and clean wait loop in assembly.The problem is that if I don't have heavy workload in main program loop, it all looks nice.But if I have some massive FP or integer calculations there, the moment the shifting out starts stops to be deterministic. I see delays for few cycles...Is there a way to flush the pipeline, caches, etc, to make this miment deterministic? I don't care much about the interrupt latency. I do care about starting the shiftout exactly when the timer counter reaches certain value (value can be tweaked, no problems), and this moment should be as precise as possible.2013-01-28 7:57 PM
ISB flushes the pipeline
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dai0179b/ar01s02s04.html
DSB and DMB work with data buffer, basically fencing operations. Not sure you can impact the ''cache'' for the flash within ST's ART unit as this is outside the core. Some of the FPU instructions run 14 cycles2013-01-29 2:24 AM
Thanks, clive1!
Yes, FP operations can be long, but I spend >700 TIM3 ticks in that waiting loop, all the operations are sure complete to the moment of exiting the loop.I tried ISB, DMB, and DSB actually, just didn't mantion that, as these didn't help much.Is there a posiibility that hardwate TIM3 counter clear can be affected by few cycles by the stuff the MCU is doing in the main loop?Also, one observation: the timing inaccuracies also appear when I enable systick interrupt which kicks in every 10 msec. I have another idea on how to overcome this.As a sync pulse is falling edge, 4 usec, then rising edge, I can enter EXTI interrupt triggered by falling edge, then enter the infinite loop that checks something like MustExit variable. Then HW clears the TIM3 counter by rising edge, and I enter the TIM3 CC interupt when counter reaches 100. Then I wait for counter to be 850 and start shifting out the data. After I've started it, I raise MustExit variable so EXTI interrupt can complete.In this case MCU will be spinning in my simple loop at the moment of entering TIM3 ISR and nothing should disturb the MCU core much.Does this seem like a viable solution? I'll try this and reply back soon..2013-01-29 5:02 AM
Maybe you can play with the NVIC priorities to make sure the TIM interrupt isn't preempted, but the SysTick, etc are?
Reading the APB takes 4 cycles as I recall.2013-01-29 5:13 AM
And you might review your code in regard of FPU usage and context saving - see
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0553a/BABJFDJB.html
Disable FPU context saving if you don't need it, as it adds several cycles to interrupt entry/exit.