cancel
Showing results for 
Search instead for 
Did you mean: 

Why does the execution time of STM32F767ZIT6 (used in NUCLEO-F767ZI) not grow as expected when the amount of code to execute increases?

Silvano
Associate II

I measure the execution time of a function by putting the toggle of a digital output and the function call into a cycle and measuring the period of the square wave at the digital output with an oscilloscope. The execution time is half a period.

To test the procedure, I used instead of the function a dummy code (see main function in the annex execution_time_main.c) where the cnt_max parameter allows me to vary the amount of code to execute.

The measured periods are visible in the attachment CRAZY_TIMES.jpg

I do not understand why both with the ARM V5 and V6 compiler in Keil uVision the trend of the periods is that shown in the attachment CRAZY_TIMES.jpg

  • Comp V5, test case 8 to 9 cnt_max go from 8000 to 9000 (increases) and  period go from 485 us to 454 us (decreases) !
  • Comp V5, test case 10 to 11 cnt_max go from 10 000 to 100 000 (x 10) and period go from 505 us to 11.1 ms (x 22) !
  • Comp V6, test case 10 to 11 cnt_max go from 10 000 to 100 000 (x 10) and period go from 607 us to 29.6 ms (x 49) !

Thanks

26 REPLIES 26

Hi KiptonM

I did new tests with interrupts disabled.

In TEST_RESULTS.jpg (in ATTACH_FILE_#1.zip) there is a summary of the test.

In ATTACH_FILE_#1.zip there are also list files (.lst) and source files (.c) about two tests case (the code is very short)

Hi TDK.

I did as you advised me.

In TEST_RESULTS.jpg (in ATTACH_FILE_#1.zip) there is a summary of the test tests.

In ATTACH_FILE_#1.zip there are also list files (.lst) and source files (.c) about two tests case (the code is very short)

..., I forgot the attachments ... See ATTACH_FILE_#1.zip

LCE
Principal

Here's how you enable the cycle counter on F767 (I'm using the same Nucleo board):

/* CPU cycle count activation for debugging */
#if DEBUG_CPU_TIMING
	CoreDebug->DEMCR |= CoreDebug_DEMCR_TRCENA_Msk;
	DWT->LAR = 0xC5ACCE55;
	DWT->CYCCNT = 0;
	DWT->CTRL |= DWT_CTRL_CYCCNTENA_Msk;
	DWT->CTRL |= DWT_CTRL_PCSAMPLENA_Msk;
#endif

Then you can start reading DWT->CYCCNT as shown above by TDL.

I have also found some weird timing behavior, it turned out that part of it depended on compiling as Debug version (using CubeIDE).

But some strange timings remained, I didn't have time to figure it out, I guess it might be some "memory domain border crossing" issue (DTCM, SRAM1, SRAM2).

Silvano
Associate II

Very simple case for analysis

Hi

To simplify the analysis, I have now concentrated on a single anomaly that is highlighted by simply using the exact same source code except for the iterations parameter which assumes different values (100, ... 10 000 000).

Here I have collected the different suggestions given to me by @Community member​, @TDK​ and @KiptonM​. 

  • Interrupt are disabled (__disable_irq(); in line 13).
  • DWT->CYCCNT was used instead of the oscilloscope.
  • All tests now use a single function
  • Disassembly is now available (ATTACH_FILE_#2 list files $1.lst about test case 3 and $2.lst about test case 4)

Follow the code for test case 3 (iterations = 10000).

#include "stm32f767xx.h"
 
const uint32_t iterations = 10000UL;
volatile uint32_t total_for_watch = 0; 
 
void test_function (uint32_t iterations);
 
int main (void) 
{
    uint32_t start = 0;
    uint32_t finish = 0;    
 
    __disable_irq();
 
    for (;;)
    {
        start = DWT->CYCCNT;
        test_function(iterations);
        finish= DWT->CYCCNT;
        total_for_watch = finish - start; 
    }
    
    return 0;
}
 
void test_function (uint32_t iterations)
{
    uint32_t cnt = 0;
 
    /* Dummy code */  
    while (cnt < iterations)
    {   
        cnt++;
    }
}

A summary of the results follows

0693W00000KZvpBQAT.jpgThe anomaly is that in test case 3 to 4 iterations go from 10 000 to 100 000 (x 10) and  total_for_watch go from 20020 cycles to 150029 cycles (x 7.5) ! 

All tests are with Compiler ARM V5 (in Keil uVision 5.36) without optimizations in C99 mode.

Note. In other tests not documented here I have seen that the total_for_watch  values are consistent with time measurements obtained using digital output 

toggles and oscilloscope. I therefore believe the total_for_watch values to be reliable.

Hi TDK.

Skip straight to the bottom of the message list for an easier-to-investigate "anomaly" case (Very simple case for analysis)

Hi KiptonM.

Skip straight to the bottom of the message list for an easier-to-investigate "anomaly" case (Very simple case for analysis)

Hi Tesla DeLorean

Skip straight to the bottom of the message list for an easier-to-investigate "anomaly" case (Very simple case for analysis)

TDK
Guru

You're still not calling the function multiple times within the same program. Create a program with all of them within the same program. Otherwise, you still have a compilation step which is changing the code.

uint32_t start = DWT->CYCCNT;
test_function(100);
uint32_t end1 = DWT->CYCCNT - start; 
test_function(1000);
uint32_t end2 = DWT->CYCCNT - end1; 
test_function(10000);
uint32_t end3 = DWT->CYCCNT - end2; 
test_function(100000);
uint32_t end4 = DWT->CYCCNT - end3; 
test_function(1000000);
uint32_t end5 = DWT->CYCCNT - end4; 

In your disassembly, you can see the compiler is doing different things based on the value you're using. It does seem like an alignment thing.

        0x08000374:    f2427010    B..p    MOV      r0,#0x2710
        0x08000378:    f000f80a    ....    BL       test_function ; 0x8000390
 
 
        0x08000374:    4805        .H      LDR      r0,[pc,#20] ; [0x800038c] = 0x186a0
        0x08000376:    f000f80d    ....    BL       test_function ; 0x8000394

If you feel a post has answered your question, please click "Accept as Solution".
Silvano
Associate II

Hi @Community member​, @KiptonM​, @TDK​  and @Community member​ 

TDK is right. Test results follow

0693W00000Ka3fnQAB.jpgThe code used follows.

#include "stm32f767xx.h"
 
void test_function (uint32_t iterations);
 
uint32_t last_dwt_cyccnt, current_dwt_cyccnt;
 
/* Variable for watch */
volatile uint32_t computing_time_1, computing_time_2, computing_time_3,
                  computing_time_4, computing_time_5, computing_time_6;  
 
int main (void) 
{
    __disable_irq();
 
    for (;;)
    {
        last_dwt_cyccnt = DWT->CYCCNT;
      
        test_function(100UL);
        current_dwt_cyccnt = DWT->CYCCNT;
        computing_time_1 = current_dwt_cyccnt - last_dwt_cyccnt;
        last_dwt_cyccnt = current_dwt_cyccnt;
      
        test_function(1000UL);
        current_dwt_cyccnt = DWT->CYCCNT;
        computing_time_2 = current_dwt_cyccnt - last_dwt_cyccnt;
        last_dwt_cyccnt = current_dwt_cyccnt;
 
        test_function(10000UL);
        current_dwt_cyccnt = DWT->CYCCNT;
        computing_time_3 = current_dwt_cyccnt - last_dwt_cyccnt;
        last_dwt_cyccnt = current_dwt_cyccnt;
      
        test_function(100000UL);
        current_dwt_cyccnt = DWT->CYCCNT;
        computing_time_4 = current_dwt_cyccnt - last_dwt_cyccnt;
        last_dwt_cyccnt = current_dwt_cyccnt;
      
        test_function(1000000UL);
        current_dwt_cyccnt = DWT->CYCCNT;
        computing_time_5 = current_dwt_cyccnt - last_dwt_cyccnt;
        last_dwt_cyccnt = current_dwt_cyccnt;
        
        test_function(10000000UL);
        current_dwt_cyccnt = DWT->CYCCNT;
        computing_time_6 = current_dwt_cyccnt - last_dwt_cyccnt;        
    }
    
    return 0;
}
 
void test_function (uint32_t iterations)
{
    volatile uint32_t cnt = 0;
 
    /* Dummy code */  
    while (cnt < iterations)
    {   
        cnt++;
    }
}

But my original problem was that it seemed strange to me (and therefore I thought I had made some mistakes) that very small changes in the code could result in very large changes in the runtime. For example, consider the following code, which I call %2.

#include "stm32f767xx.h"
 
void test_function (uint32_t iterations);
 
uint32_t last_dwt_cyccnt, current_dwt_cyccnt;
 
/* Variable for watch */
volatile uint32_t computing_time_1, computing_time_2, computing_time_3;  
volatile uint32_t computing_time_4, computing_time_5, computing_time_6;  
 
int main (void) 
{
 
    volatile uint32_t system_core_clock_for_watch = SystemCoreClock; // @1
 
    __disable_irq();
 
    for (;;)
    {
        last_dwt_cyccnt = DWT->CYCCNT;
        test_function(1000000UL);
        current_dwt_cyccnt = DWT->CYCCNT;
        computing_time_5 = current_dwt_cyccnt - last_dwt_cyccnt;
        last_dwt_cyccnt = current_dwt_cyccnt;
    }
    
    return 0;
}
 
void test_function (uint32_t iterations)
{
    volatile uint32_t cnt = 0;
 
    /* Dummy code */  
    while (cnt < iterations)
    {   
        cnt++;
    }
}

Then consider the following code, which I call %3.

The only difference is that in% 3 the line 14

volatile uint32_t system_core_clock_for_watch = SystemCoreClock; // @1

present in% 2 has been deleted.

#include "stm32f767xx.h"
 
void test_function (uint32_t iterations);
 
uint32_t last_dwt_cyccnt, current_dwt_cyccnt;
 
/* Variable for watch */
volatile uint32_t computing_time_1, computing_time_2, computing_time_3;  
volatile uint32_t computing_time_4, computing_time_5, computing_time_6;  
 
int main (void) 
{
 
    //volatile uint32_t system_core_clock_for_watch = SystemCoreClock; // @1
 
    __disable_irq();
 
    for (;;)
    {
        last_dwt_cyccnt = DWT->CYCCNT;
        test_function(1000000UL);
        current_dwt_cyccnt = DWT->CYCCNT;
        computing_time_5 = current_dwt_cyccnt - last_dwt_cyccnt;
        last_dwt_cyccnt = current_dwt_cyccnt;
    }
    
    return 0;
}
 
void test_function (uint32_t iterations)
{
    volatile uint32_t cnt = 0;
 
    /* Dummy code */  
    while (cnt < iterations)
    {   
        cnt++;
    }
}

!!! computing_time_5 went from 5 444 543 cycles to 15 000 038 cycles (3 x).

See ATTACH_FILE_#3 list files %2.lst about code %2 and ATTACH_FILE_#3 list files %3.lst about code %3 (ARM compiler V5, C99, optimization level 0 (-O0)).

I also found similar behaviors also making other very small changes in the code other than the one now shown or with the ARM 6 compiler.

I am new with ARM. My previous experiences are related to Microchip's simple dsPIC MCUs or Motorola's older 68000 CPUs but I have never experienced such a high sensitivity of execution times towards the code.

Am I the only one who finds this behavior "strange" or for you who move in the ARM world with more experience than me is this behavior normal ?