2015-08-06 01:52 AM
Hi,
Was wondering if anyone has any help for me on this.Device: STM32F334R8T6 (on the nucleo)Frequency: 64MHz (double checked outputting the sysclock on MCO), 2 flash wait statesIDE+compiler: SW4STM32 (free eclipse based)Std Library: V1.1.1 (4 April 2014) STM32F30x Standard Peripherals Library DriversI am reading a single wire bitstream with a max speed of 250kpbs (4us a bit). Using tim2 input capture channel 1 & 2 to read the rising and falling edge times, but finding the interrupt latency was too long (5us!).Eventually in order to debug, I 'copied' the data stream in the interrupt on another output pin, so when it entered the rising edge interrupt I'd output a high on a different output pin, and when falling output a low - nothing else. The interrupt latency from the input change and seeing it on the output pin from the interrupt was around 5us! So it'd miss bits (because they're 4us) and it would be stuck servicing the interrupt all the time.Is this kind of latency normal? I was expecting maybe maximum 1us or so. 12cycles @ 64Mhz = 187ns.Has anyone seen this kind of latency? #timer #stm32 #interrupt2015-08-06 02:20 AM
When the material printed on glossy paper says ''fast deterministic interrupt handling for cortex-M'', read it as ''it's not that dam'd slow and jittery as in other ARMs, but still far worse than in real microcontrollers''. Don't have too much expectations and read the fine manual, in this case the Cortex-M4 Technical Reference Manual by ARM, chapter 3.9.1:
''There is a maximum of a twelve cycle latency from asserting the interrupt to execution of the first instruction of the ISR[...]'' For the rest of it, much depends on your code and the optimization performed by compiler (check your settings). > 2 flash wait states this of course slows things down further. JW2015-08-06 02:49 AM
All very well, but 5-6us!?
My code looks like this, and sitting in a while loop doing nothing else.void
TIM2_IRQHandler(
void
)
{
u8 risingEdgeStatus;
u8 fallingEdgeStatus;
risingEdgeStatus = TIM_GetITStatus(TIM2, TIM_IT_CC2);
fallingEdgeStatus = TIM_GetITStatus(TIM2, TIM_IT_CC1);
/* TIM2_CH1 Input capture Falling edge*/
if
(fallingEdgeStatus != RESET)
{
GPIO_WriteBit(GPIOA, GPIO_Pin_5, 0);
}
/* TIM2_CH2 Input capture Rising edge*/
if
(risingEdgeStatus != RESET)
{
GPIO_WriteBit(GPIOA, GPIO_Pin_5, 1);
}
TIM_ClearITPendingBit(TIM2, TIM_IT_CC2);
TIM_ClearITPendingBit(TIM2, TIM_IT_CC1);
}
2015-08-06 02:50 AM
> I was expecting maybe maximum 1us or so. 12cycles @ 64Mhz = 187ns.
Ah, I've just noticed you are aware of the 12 cycles. There's still some code involved, and the compiler's optimization. 5us sounds a tad bit too much though. Are you sure you are running at 64MHz? What's the APB clock on the bus where the timer sits? Are there any other interrupts running? Did you set some of the filtering in TIMx_CCMRy.ICzF? You don't use the Cube, do you? - if so, there will be some unnecessary fluff from that one, to be get rid off. JW2015-08-06 03:35 AM
Hmm, I'll admit I didn't even know about the filters, but I checked them and
TIM2_CCMRy.ICzF are all 0.Do you think its a compiler issue? I'm using the free one and optimisation hardly helped, reduced it from 6us to 5us. Is the Cube fw library a lot less bloat?2015-08-06 04:22 AM
Once again:
1. make sure you ARE running at 64MHz - either output clock onto MCO, or toggle a pin using output compare in timer. This is suspect #1. 2. Make sure optimization is on. I don't know about.your particular IDE and I don't care. IDEs are notorious for hiding and automating things beyond reasonable; and they usually set little or no optimization for debugging ''profile'' or whatever, as otherwise single-stepping in source would make the cursor jump up and down and viewing optimized-out variables would be impossible. gcc's general optimization switch is -O; any of -O2/-O3/-Os should be fine 3. In doubts check disassembled output, and/or single-step, and/or benchmark using the debug timer/timing facilities of the IDE JW PS. I personally despise using any ''library'', but in this I obviously deviate from the norm. Cube puts in the way in even more ''layers'' than SPL; but ST pushes Cube and deprecates SPL.2015-08-06 06:56 AM
1) Yeah, I def am running at 64Mhz, one of the first things I checked - stated in the conditions in the first post as well.
2) I also already tried changing optimisations to -O2/3 early on, and it helped by 1us maybe, from 6us to 5us. But not considerably. I've basically given up and am going a different route. A bit ridiculous but you roll with the punches I guess. Appreciate the input, cheers mate.2015-08-06 07:42 AM
But not considerably
Yeah I don't think it's going to eliminate all the function calls and data manipulation. It's not hard to see several 100 machine cycles getting eaten here. Consider using assembler, and CCMRAM.If you're capturing both edges you're looking at a 500 KHz interrupt rate.Consider PWM Input mode it will half the interrupt loading. Consider using DMA to collect time stamps.2015-08-06 09:26 AM
> But not considerably
>
> Yeah I don't think it's going to eliminate all the function calls and data manipulation. The ''libraries'' might've been compiled beforehand and the optimization switch change might not be applied to them. It's easy to check for this - rewrite the ISR for direct register access - it's trivial to check and modify a handful of bits. Assembler might be a good idea when it comes to tight timing control, but the Thumb2 assembler is mostly a nightmare. Running critical code from RAM might speed up things a bit too. JW2015-08-10 05:43 AM
// deleted