How do I measure execution time and achieve determinism?

PB1 · ‎2022-05-20

Hello to everyone. I would like to prepare some exercises where my students should quantitatively design a real-time application on a Nucleo F767ZI boards, but to do so they will need to determine and control the execution time of their code. I am asking for some very general advice on a couple of perhaps basic issues:

1- What technique do you suggest to estimate the execution time (in the average and worst case) of a block of code? We do not need an extremely precise technique, just something easy for my students to use and make a estimate that has some sense.

2- What is it necessary to do to have the execution time of code as deterministic as possible w.r.t "external" factors (i.e., factors different from the ordinary inputs of the code)? E.g., I have read that the position where code is loaded in memory may affect its execution time, so it is entirely possible that, as students rebuild code might experience variations in the execution time. Are there other relevant external factors that affect execution time we must be aware of? How do we control them?

Thank you very much

Pietro Braione

Uwe Bonnes · ‎2022-05-21

Use the Cyclecounter in the DWT unit to transfer data to a controlling PC or use a scope and some pin as Eventout. Eventout is more determined than using normal port toggle.

gregstm · ‎2022-05-22

I always like flashing a Led for timing, the board has several User Leds. I know accurate timing will require the use of an oscilloscope (or similar) - but the absence of any Led activity will tell the student visually they have something seriously wrong with their code that they need to fix to progress.

MasterT · ‎2022-05-23

2 Consider execution time of the code as Undetermined. For very sensitive to timing application programmers usually dive into assembly code. For myself, to keep track on uCPU load %, I 'm using a GPIO and oscilloscope. Any part of the code that require precision - use a Timer interrupt to process a function on a periodic time base.

LED may work in case absence of a scope, only use division to flash in a range 0.2-10 sec with duty cycle 50%. So audio 48 kHz sampling clock reasonably to divide by 10.000

Tesla DeLorean · ‎2022-05-23

Problem with F7 is you're dealing with a superscalar MCU, with caches, speculative fetches, random eviction, and larger line prefetch/cache on the FLASH memory. Don't recall the details for the F767 but it's probably got 128-bit flash lines. And it has a different core and cache than the F746 and F722 parts.

On a flash line miss there's quite a hit for the initial fetch, but subsequent words in the line prefetch faster than RAM.

Generally I'd use the DWT CYCCNT to bench mark and evaluate code performance, getting some best, worst, and average to drive optimization efforts.

Reconstructive trace might also help with profiling, PDQ Logic has such a tool, and a plug-in board to get trace off NUCLEO-144 boards.

https://www.pdqlogic.com/

https://www.pdqlogic.com/what-is-trace/

https://www.pdqlogic.com/#Nucleo-Converter

https://www.st.com/resource/en/application_note/an4667-stm32f7-series-system-architecture-and-performance-stmicroelectronics.pdf

https://www.st.com/resource/en/application_note/dm00272913-level-1-cache-on-stm32f7-series-and-stm32h7-series-stmicroelectronics.pdf

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..