STM32 M0 and M3 caching

Nickelgrass · ‎2019-10-20

Hello,

I have a general question about the CPU speed an instruction caching. I have worked a lot with 8bit AVR µC that typically run up to 16 MHz. It is always clear how fast an assembler code will run and when which command is executed and how long it will take.

The STM's are a lot faster and I experienced situations where the same code would take a different time to execute. I understand what caching is there for but I don't understand how the CPU determines when to cache something. Or does the compiler take all this in account?

Is there any way to precisely predict the caching or even influence it? For example when bit banging very short times down to several ns on a GPIO without the possibility for a peripheral this would be necessary.

How does caching influence the interrupt response time? Is there a ISB command at the beginning of each ISR? But then again the caching would take some time until the ISR is executed.

Is there any detailed information on this subject for the M0 and M3 cores?

Thanks

Best regards

Tesla DeLorean · ‎2019-10-20

ARM has Technical Reference Manuals on the cores, Joseph Yiu has a more human interpretation of those.

Neither the CM0 nor CM3 have any caching

On some parts ST has a cache in front of the flash (ART), which is inherently slow, they make the flash array very wide, typically 64 or 128-bits, such that you might take a hit on the first fetch on a line, but the remainder have zero cost in to the prefetch input. Often this is faster than a single cycle RAM read.

The CM0 is very crippled, to save transistors, the flash on most ST CM0 parts don't have this feature, so beyond 24 MHz you have to use a wait state, so you'd need to clock a lot faster to get any speed improvement.

You'd have to model caching, and the effect of evicting lines you subsequently need again.

Tail-chaining impacts IRQ service more, as it folds the context push/pop when it can go directly into another pending interrupt.

For tight bit-bang loop, one would use a free-running TIM or DWT_CYCCNT to get reliable sub-microsecond granularity, and interrupt immunity. Interrupts with look service times will clearly extend loops if you dwell excessively.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..