2022-02-16 11:20 AM
I measure the execution time of a function by putting the toggle of a digital output and the function call into a cycle and measuring the period of the square wave at the digital output with an oscilloscope. The execution time is half a period.
To test the procedure, I used instead of the function a dummy code (see main function in the annex execution_time_main.c) where the cnt_max parameter allows me to vary the amount of code to execute.
The measured periods are visible in the attachment CRAZY_TIMES.jpg
I do not understand why both with the ARM V5 and V6 compiler in Keil uVision the trend of the periods is that shown in the attachment CRAZY_TIMES.jpg
Thanks
2022-02-20 09:46 AM
2022-02-20 09:47 AM
Hi TDK.
I did as you advised me.
In TEST_RESULTS.jpg (in ATTACH_FILE_#1.zip) there is a summary of the test tests.
In ATTACH_FILE_#1.zip there are also list files (.lst) and source files (.c) about two tests case (the code is very short)
2022-02-20 09:49 AM
2022-02-20 10:10 PM
Here's how you enable the cycle counter on F767 (I'm using the same Nucleo board):
/* CPU cycle count activation for debugging */
#if DEBUG_CPU_TIMING
CoreDebug->DEMCR |= CoreDebug_DEMCR_TRCENA_Msk;
DWT->LAR = 0xC5ACCE55;
DWT->CYCCNT = 0;
DWT->CTRL |= DWT_CTRL_CYCCNTENA_Msk;
DWT->CTRL |= DWT_CTRL_PCSAMPLENA_Msk;
#endif
Then you can start reading DWT->CYCCNT as shown above by TDL.
I have also found some weird timing behavior, it turned out that part of it depended on compiling as Debug version (using CubeIDE).
But some strange timings remained, I didn't have time to figure it out, I guess it might be some "memory domain border crossing" issue (DTCM, SRAM1, SRAM2).
2022-02-21 05:27 AM
Very simple case for analysis
Hi
To simplify the analysis, I have now concentrated on a single anomaly that is highlighted by simply using the exact same source code except for the iterations parameter which assumes different values (100, ... 10 000 000).
Here I have collected the different suggestions given to me by @Community member, @TDK and @KiptonM.
Follow the code for test case 3 (iterations = 10000).
#include "stm32f767xx.h"
const uint32_t iterations = 10000UL;
volatile uint32_t total_for_watch = 0;
void test_function (uint32_t iterations);
int main (void)
{
uint32_t start = 0;
uint32_t finish = 0;
__disable_irq();
for (;;)
{
start = DWT->CYCCNT;
test_function(iterations);
finish= DWT->CYCCNT;
total_for_watch = finish - start;
}
return 0;
}
void test_function (uint32_t iterations)
{
uint32_t cnt = 0;
/* Dummy code */
while (cnt < iterations)
{
cnt++;
}
}
A summary of the results follows
The anomaly is that in test case 3 to 4 iterations go from 10 000 to 100 000 (x 10) and total_for_watch go from 20020 cycles to 150029 cycles (x 7.5) !
All tests are with Compiler ARM V5 (in Keil uVision 5.36) without optimizations in C99 mode.
Note. In other tests not documented here I have seen that the total_for_watch values are consistent with time measurements obtained using digital output
toggles and oscilloscope. I therefore believe the total_for_watch values to be reliable.
2022-02-21 05:31 AM
Hi TDK.
Skip straight to the bottom of the message list for an easier-to-investigate "anomaly" case (Very simple case for analysis)
2022-02-21 05:32 AM
Hi KiptonM.
Skip straight to the bottom of the message list for an easier-to-investigate "anomaly" case (Very simple case for analysis)
2022-02-21 05:35 AM
Hi Tesla DeLorean
Skip straight to the bottom of the message list for an easier-to-investigate "anomaly" case (Very simple case for analysis)
2022-02-21 06:44 AM
You're still not calling the function multiple times within the same program. Create a program with all of them within the same program. Otherwise, you still have a compilation step which is changing the code.
uint32_t start = DWT->CYCCNT;
test_function(100);
uint32_t end1 = DWT->CYCCNT - start;
test_function(1000);
uint32_t end2 = DWT->CYCCNT - end1;
test_function(10000);
uint32_t end3 = DWT->CYCCNT - end2;
test_function(100000);
uint32_t end4 = DWT->CYCCNT - end3;
test_function(1000000);
uint32_t end5 = DWT->CYCCNT - end4;
In your disassembly, you can see the compiler is doing different things based on the value you're using. It does seem like an alignment thing.
0x08000374: f2427010 B..p MOV r0,#0x2710
0x08000378: f000f80a .... BL test_function ; 0x8000390
0x08000374: 4805 .H LDR r0,[pc,#20] ; [0x800038c] = 0x186a0
0x08000376: f000f80d .... BL test_function ; 0x8000394
2022-02-22 03:05 AM
Hi @Community member, @KiptonM, @TDK and @Community member
TDK is right. Test results follow
The code used follows.
#include "stm32f767xx.h"
void test_function (uint32_t iterations);
uint32_t last_dwt_cyccnt, current_dwt_cyccnt;
/* Variable for watch */
volatile uint32_t computing_time_1, computing_time_2, computing_time_3,
computing_time_4, computing_time_5, computing_time_6;
int main (void)
{
__disable_irq();
for (;;)
{
last_dwt_cyccnt = DWT->CYCCNT;
test_function(100UL);
current_dwt_cyccnt = DWT->CYCCNT;
computing_time_1 = current_dwt_cyccnt - last_dwt_cyccnt;
last_dwt_cyccnt = current_dwt_cyccnt;
test_function(1000UL);
current_dwt_cyccnt = DWT->CYCCNT;
computing_time_2 = current_dwt_cyccnt - last_dwt_cyccnt;
last_dwt_cyccnt = current_dwt_cyccnt;
test_function(10000UL);
current_dwt_cyccnt = DWT->CYCCNT;
computing_time_3 = current_dwt_cyccnt - last_dwt_cyccnt;
last_dwt_cyccnt = current_dwt_cyccnt;
test_function(100000UL);
current_dwt_cyccnt = DWT->CYCCNT;
computing_time_4 = current_dwt_cyccnt - last_dwt_cyccnt;
last_dwt_cyccnt = current_dwt_cyccnt;
test_function(1000000UL);
current_dwt_cyccnt = DWT->CYCCNT;
computing_time_5 = current_dwt_cyccnt - last_dwt_cyccnt;
last_dwt_cyccnt = current_dwt_cyccnt;
test_function(10000000UL);
current_dwt_cyccnt = DWT->CYCCNT;
computing_time_6 = current_dwt_cyccnt - last_dwt_cyccnt;
}
return 0;
}
void test_function (uint32_t iterations)
{
volatile uint32_t cnt = 0;
/* Dummy code */
while (cnt < iterations)
{
cnt++;
}
}
But my original problem was that it seemed strange to me (and therefore I thought I had made some mistakes) that very small changes in the code could result in very large changes in the runtime. For example, consider the following code, which I call %2.
#include "stm32f767xx.h"
void test_function (uint32_t iterations);
uint32_t last_dwt_cyccnt, current_dwt_cyccnt;
/* Variable for watch */
volatile uint32_t computing_time_1, computing_time_2, computing_time_3;
volatile uint32_t computing_time_4, computing_time_5, computing_time_6;
int main (void)
{
volatile uint32_t system_core_clock_for_watch = SystemCoreClock; // @1
__disable_irq();
for (;;)
{
last_dwt_cyccnt = DWT->CYCCNT;
test_function(1000000UL);
current_dwt_cyccnt = DWT->CYCCNT;
computing_time_5 = current_dwt_cyccnt - last_dwt_cyccnt;
last_dwt_cyccnt = current_dwt_cyccnt;
}
return 0;
}
void test_function (uint32_t iterations)
{
volatile uint32_t cnt = 0;
/* Dummy code */
while (cnt < iterations)
{
cnt++;
}
}
Then consider the following code, which I call %3.
The only difference is that in% 3 the line 14
volatile uint32_t system_core_clock_for_watch = SystemCoreClock; // @1
present in% 2 has been deleted.
#include "stm32f767xx.h"
void test_function (uint32_t iterations);
uint32_t last_dwt_cyccnt, current_dwt_cyccnt;
/* Variable for watch */
volatile uint32_t computing_time_1, computing_time_2, computing_time_3;
volatile uint32_t computing_time_4, computing_time_5, computing_time_6;
int main (void)
{
//volatile uint32_t system_core_clock_for_watch = SystemCoreClock; // @1
__disable_irq();
for (;;)
{
last_dwt_cyccnt = DWT->CYCCNT;
test_function(1000000UL);
current_dwt_cyccnt = DWT->CYCCNT;
computing_time_5 = current_dwt_cyccnt - last_dwt_cyccnt;
last_dwt_cyccnt = current_dwt_cyccnt;
}
return 0;
}
void test_function (uint32_t iterations)
{
volatile uint32_t cnt = 0;
/* Dummy code */
while (cnt < iterations)
{
cnt++;
}
}
!!! computing_time_5 went from 5 444 543 cycles to 15 000 038 cycles (3 x).
See ATTACH_FILE_#3 list files %2.lst about code %2 and ATTACH_FILE_#3 list files %3.lst about code %3 (ARM compiler V5, C99, optimization level 0 (-O0)).
I also found similar behaviors also making other very small changes in the code other than the one now shown or with the ARM 6 compiler.
I am new with ARM. My previous experiences are related to Microchip's simple dsPIC MCUs or Motorola's older 68000 CPUs but I have never experienced such a high sensitivity of execution times towards the code.
Am I the only one who finds this behavior "strange" or for you who move in the ARM world with more experience than me is this behavior normal ?