Tips for maximizing performance and minimizing latency of interrupt handlers on STM32F0/CortexM0?

TwelveSquared · ‎2021-03-01

Also, what is the minimum time (or instruction cycles) to begin servicing an interrupt, i.e., how long does state saving take at minimum? Where would I find this kind of performance information? I am using STM32F031G6.

Javier1 · ‎2021-03-01

The question here is : how far are you willing to go?

You can get better performance if you substitute HAL_ or LL for direct register coding (like TIM->CNT type)

And of course you could dare to code in assembler also.

KnarfB · ‎2021-03-01

A good intro: https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/beginner-guide-on-interrupt-latency-and-interrupt-latency-of-the-arm-cortex-m-processors

There are some factors like flash and RAM latency and higher priority interrupts which might add additional cycles.

It can be useful to measure interrupt latencies and jitter in a live system, e.g. by toggling some GPIOs. This should be done at register level in the real interrupt handler before LL or HAL or .. kick in.

hth

KnarfB

Tesla DeLorean · ‎2021-03-01

Joseph Yiu's books have generally had good information on the interrupt context and tail-chaining.

https://developer.arm.com/documentation/dui0497/a/the-cortex-m0-processor/exception-model/exception-entry-and-return

The CM3 was 12-cycles, and I doubt the CM0 is more efficient

https://interrupt.memfault.com/blog/arm-cortex-m-exceptions-and-nvic

Generally have the vector table and code (all call tree) in RAM, avoid unnecessary bloat, call-chains and call backs.

Flash access on processors without any caching (ART cache, or whatever) will be significantly slowed by wait-states. A CM0 clocked at 24 MHz and zero wait-states, will out-run one at 27 MHz with one wait-state, you'll have to measure how fast you need to run to get faster execution.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

TwelveSquared · ‎2021-03-01

Thanks everyone for your replies, and links.

@Javier Muñoz , I go pretty far. First, I am already not using HAL or LL. CubeMX is an extremely valuable resource and I use it to quickly find a "recipe" that works. But once I know the "recipe" I code it using direct register programming. Not in ASM, though. That's the compiler's job. 🙂

From one of the above links, I learned that there should be a 16 cycle setup time from interrupt pended to running the ISR, provided there are no wait states. The code is in FLASH currently so I will try @Community member 's advice to put the vector table and ISRs in RAM. Currently, with the code in FLASH and the STM32F031G6 at 48 MHz (the maximum for this chip) it appears to be taking about 740-820 ns "set up time" (80 ns jitter) from hardware event to start of my interrupt code, with some interrupts starting earlier, around 610 ns "set up time" probably saving time by tail-chaining or other optimizations.

Any other "easy" tips like moving the code and vector table to RAM?

Uwe Bonnes · ‎2021-03-01

If there are flash wait state, they can can spared by having the jumptable and the interrupts in RAM.

waclawek.jan · ‎2021-03-01

> from hardware event to start of my interrupt code

If you insist on using C, there may be a C function prologue which delays the perceived ISR entry, too.

You may achieve low best-case latency by optimizing the ISR itself, but then there are other influences, introducing jitter of latency.

Delay/jitter worsens with running code (other than the ISR) from memory with high latency, using data memory with high latency, using multi-cycle uninterruptible instructions, accessing peripherals through buses with divided clocks, accessing peripherals with longer access times (inserted waitstates, maybe synchronizing from other clock domains such as RTC), through bus conflicts with other busmasters, and of course using interrupt disable/enable sequences and using other interrupts with higher priority. I may have omitted some other sources of delay/jitter. All these things may combine and the worst case may be hard to estimate yet alone calculate.

As a rule of the thumb, in 32-bitters, try to avoid using interrupts for any process requiring latency below several hundreds of cycles (i.e. 10s of us).

JW