Skip to main content
LMorr.3
Senior
June 20, 2022
Question

How do I calculate how many clock cycles are needed to run block of code?

  • June 20, 2022
  • 4 replies
  • 9550 views

Is it possible to calculate how many clock ticks a function or select block of code will use up?

    This topic has been closed for replies.

    4 replies

    Tesla DeLorean
    Guru
    June 21, 2022

    Most STM32 you can use the DWT's CYCCNT

    On CM0(+) perhaps use one of the TIM (TIM2 or TIM5 are 32-bit on some STM32)

    volatile unsigned int *DWT_CYCCNT = (volatile unsigned int *)0xE0001004; //address of the register
    volatile unsigned int *DWT_CONTROL = (volatile unsigned int *)0xE0001000; //address of the register
    volatile unsigned int *DWT_LAR = (volatile unsigned int *)0xE0001FB0; //address of the register
    volatile unsigned int *SCB_DEMCR = (volatile unsigned int *)0xE000EDFC; //address of the register
     
    {
     uint32_t x, y;
     uint32_t Cycles;
     
     *SCB_DEMCR |= 0x01000000;
     *DWT_LAR = 0xC5ACCE55; // enable access
     *DWT_CYCCNT = 0; // reset the counter
     *DWT_CONTROL |= 1 ; // enable the counter
     
     x = *DWT_CYCCNT;
    ...
     y = *DWT_CYCCNT;
     
     Cycles = (y - x);
    }

    Tips, Buy me a coffee, or three.. PayPal Venmo (See Profile) Up vote any posts that you find helpful, it shows what's working..
    LMorr.3
    LMorr.3Author
    Senior
    June 21, 2022

    Did now know about this, thanks!

    Danish1
    Lead III
    June 21, 2022

    In general the answer is no. It is easy to write code where the number of times it loops depends on some complicated function of the input-value(s) hence execution-time is equally varied.

    And even without loops and branches (e.g. if-statements) it can be hard to get an exact number of cycles. Many stm32* have things like caches that reduce the number of cycles it takes to read or write to memory, so the total number of cycles depends whether or not the cache was able to help.

    And the ARM core might not be the only thing accessing memory - there might be DMA fighting for access over the bus-matrix as well.

    *But not the “simpler�? ones e.g. stm32l0, stm32f0

    Having said all this, for a lot of code the number of cycles might only vary by less than (say) 10%. So an empirical approach of measuring it - as described above - is often good enough if you are only interested in the stm32 having enough time to complete its tasks, not using the execution-time as the basis for a delay.

    (I remember seeing code for extremely simple microcontrollers without timer peripherals, where great effort went into ensuring each branch of possible program flow took exactly the same number of cycles. Things have improved since then.)

    Hope this helps,

    Danish

    Tesla DeLorean
    Guru
    June 21, 2022

    Microsoft added something similar to MASM back in the 6.X era, and I built several annotation tools for the MC68000 and 68020​ I was using at one point, being one of those engineers who writes software.

    TBH I can do static code reduction in my head, and find dynamic analysis to be more fun when optimizing algorithms or complex system interactions.​

    Tips, Buy me a coffee, or three.. PayPal Venmo (See Profile) Up vote any posts that you find helpful, it shows what's working..
    gregstm
    Senior II
    June 22, 2022

    When optimising time critical assembly language, to get predictable timing, I have made sure the instructions are aligned to a 4 byte (word) boundary - and sometimes ensured all instructions are 32 bit long. It's the memory accesses that are more complicated, and if you are trying to save cycles, it is usually more efficient to load multiple registers with data with one instruction.

    LMorr.3
    LMorr.3Author
    Senior
    June 22, 2022

    I'm also using the freeRTOS idleTaskHook to gauge idle time. from the docs:

    "Measuring the amount of spare processing capacity. (The idle task will run only when all higher priority application tasks have no work to perform; so measuring the amount of processing time allocated to the idle task provides a clear indication of how much processing time is spare.)"