2019-05-24 06:12 AM
Hello,
I am using the STM32F767ZI board and I want to measure the execution time of a function. The function I want to measure is a function I created.
At the moment, I use the timer 9 configured in internal clock (216 MHz) with:
Prescaler = 108
Counter Mode = Up
Counter Period = 59
In my program, to measure the execution time of the function, I use HAL_TIM_Base_Start_IT, HAL_TIM_Base_Stop_IT and HAL_TIM_PeriodElapsedCallback.
HAL_TIM_PeriodElapsedCallback is redefined in the main.c file and increments a variable that allows me to know the number of times the timer has interrupted.
void HAL_TIM_PeriodElapsedCallback(TIM_HandleTypeDef *htim){
if(htim->Instance == htim9.Instance){
FinChrono = FinChrono+1;
}
}
I was told that it was possible to measure the execution time without using interrupt with the timers but I can’t understand how.
Can you help me ?
2019-05-24 07:04 AM
Note that using the Callback function (or in general, interrupts) will add time to the function execution.
You can minimize the effect by not using the Timer interrupt, and just read the Timer Counter directly.
However, you have to make sure the Timer do not overflow. You would have to make Timer max count * Timer period > Worst Case Function Execution Time
If you need to use prescalers, then you also have to consider that your measurement might not be quite precise.
I would recommend using GPIO pins and an external oscilloscope or logic analyzer to do this kind of measurements:
2019-05-24 08:21 AM
Set a TIM up in a free running and maximal mode, ie Period=0xFFFF or 0xFFFFFFFF for 16 or 32-bit respectively
Read the value of TIMx->CNT on either side of the code you want to benchmark.
The Core also has a 32-bit DWT->CYCCNT register, which will count ticks in 216 MHz clock cycles.
2019-05-26 10:43 AM
@Community member : The DWT->CYCCNT register is very interesting. I didn't know about it. Thanks.
I use SysTick as a free-running timer (maximal SysTick->LOAD=0xffffff, no interrupts) as a CPU-independent utility for these kinds of generic timing tasks. But at 216 MHz it maxes out at 78 ms and the 32-bit CYCCNT would give 19.9 seconds.
Do you know if permanently enabling DWT_CTRL->CYCCNTENA would interfere with GDB/Eclipse/ST-Link debugging, or vice-versa? It seems there's no DWT on Cortex-M0+, but on M3/M4/M7 the increased timing range would be very useful.
2019-05-26 11:32 AM
The Systick counter is 24-bit and is a down-counter.
The CM0(+) lacks a lot of the DWT/ITM support hardware.
On the CM0(+) parts I'd likely use a TIM, and time short spans of code, or time hundreds of thousand iteration to get fractional accuracy. The big problem here is that if you have a wait-state on the flash it is very hard to overcome.
I use Keil, and stand-alone operation, DWT->CYCCNT works effectively there. You could check if the counter is enabled, or not, and you don't need to reset the register to delta measurements.
2019-05-26 01:15 PM
Thanks for the insights. Yes, I wish SysTick was 32 bits and/or that Cortex-M0+ had DWT.
My interest in using SysTick or DWT->CYCCNT instead of a TIM is because choice of latter is dependent on what timers the MCU has and which ones the application code is using for other purposes. I'm trying to make my timing utility as generic and non-invasive as possible. (I do low-level bare-metal coding, so no conflict with an RTOS or HAL using SysTick for a 1ms interrupt.) Also, 32-bit TIMs are rare and precious (or non-existent) and 16-bit TIMs are even worse than the 24-bit SysTick for my use case.
Thanks for pointing out the flash wait-state issue. Always something to consider. I'm usually trying to compare different algorithms and code implementations, so counting ticks is sufficient even if they don't map to microseconds by a constant factor. (As long as I always just compare running-in-flash to running-in-flash and running-in-RAM to running-in-RAM.)
I'll have to experiment and see if GDB turns DWT_CTRL->CYCCNTENA on and off, preventing me from using it for my counter.
2019-05-27 05:48 AM
Thank you for your answers.
In my case, I can't use an external component to the card.
My function must work between 100 Hz and 1000 Hz and the accuracy of the counter must be about 10 µs.
I have already used the function that counts ticks but the problem is that this function counts to the millisecond.
Reading the value of TIMx on both sides of the code interests me but what function do you use?
Besides, how do you recover the value of TIMx? I looked at the functions associated with the timer but I did not find the variable that returns this value.
2019-05-27 08:41 AM
I don't use HAL so I don't know what HAL (or LL) function returns the timer value, but you can almost certainly directly read the register using TIMx->CNT as Clive Two.Zero suggested. Note that this requires #include'ing stm32f767xx.h, but if you have any HAL or LL #includes it is almost certainly included somewhere down the .h files chain.
The "x" in "TIMx" is the timer number. The MCU on your STM32F767ZI board has has timers 1 through 14, and they come in several different "flavors" (different capabilities). You might want to use TIM2 or TIM5 because they have 32 bit counters where all the rest are 16 bit. Given your 100 Hz sampling speed and the STM32F767xx MCU's maximum clock speed of 216 MHz you need to consider counter overflow issues.
There are many different ways of doing this. You use a free-running timer and get the CNT before and after and subtract -- or add, depending if the timer is counting down or up. You can run the timer in OPM ("one-pulse mode"), start it before your function and read it after. Many of the timers have "prescalers" which divide the main clock (usually, although it can be otherwise) rate -- that will affect your timing granularity and also whether the counter will overflow at your 100 Hz rate.
You should be able to set up the timer using HAL, but I've found it just as easy to do lowest-level register programming because you need to understand the details of the timers anyway. Either way it shouldn't interfere with the rest of your HAL code as long as you're not re-using a timer that's used for something else.
In terms of accuracy, note that interrupts will affect your counts. HAL (I'm 99% sure) uses SysTick as a 1ms interrupt source (that's where the HAL "ticks" come from) so one of those (or others) could hit in the middle of your timing.
There are lots of subtle issues to consider, but the basics are simple. I don't have time right now to write example code for you, and there are others here who know much more than I do, but do some research in "RM0410 Reference manual STM32F76xxx and STM32F77xxx advanced Arm ® -based 32-bit MCUs" and I'll try to answer any questions you have.
2019-05-27 09:04 AM
The DWT/ITM blocks are locked on CM7 implementations.
volatile unsigned int *DWT_CYCCNT = (volatile unsigned int *)0xE0001004; //address of the register
volatile unsigned int *DWT_CONTROL = (volatile unsigned int *)0xE0001000; //address of the register
volatile unsigned int *DWT_LAR = (volatile unsigned int *)0xE0001FB0; //address of the register
volatile unsigned int *SCB_DEMCR = (volatile unsigned int *)0xE000EDFC; //address of the register
*SCB_DEMCR |= 0x01000000;
*DWT_LAR = 0xC5ACCE55; // unlock
*DWT_CYCCNT = 0; // reset the counter
*DWT_CONTROL |= 1 ; // enable the counter
Seem to remember this sequence working (or swapping first two lines)
2019-05-27 09:38 AM
Thanks for the info. Will save a lot of hair-pulling if and when I go the DWT_CYCCNT route.
Did some research online yesterday and came across https://stackoverflow.com/questions/36378280/stm32-how-to-enable-dwt-cycle-counter which agrees with your methodology. I can't seem to find either "0xC5ACCE55" or even the DWT->LAR register in the ARMv7-M Architecture Reference Manual (29 June 2018 "E.d" release), but at least DWT->LAR is in corecm7.h.
Also researched and found mention -- and code -- which indicates OpenOCD doesn't use CYCCNT. Nevertheless, I'll probably keep using SysTick until I absolutely need the 32-bit width.
Thanks again for all your help.