I'm playing around with STM32L011K4 (on NUCLEO-L011K4) trying to test whether SLEEPONEXIT feature (ARMv6-M/CM0P) will greatly reduce the latency of ISRs (on an ISR-only software design).
My assumption is (based on multiple sources) that setting SLEEPONEXIT and entering the first ISR will require full stack storage (like in regular ISR entry case), but after that, stack store/restore is not needed since Cortex M tail-chaining would be active for each and every next ISR. A figure that I seem to remember is 6 SYSCLKs between ISRs.
This is not what I'm seeing unfortunately. The difference in ISR-entry latency is just one SYSCLK (from 14 SYSCLKs to 13 SYSCLKs before ISR code is run) when SLEEPONEXIT is active. I am using default post-reset clocks and PWR settings and driving SYSCLK on an external pin against which I can measure things with an external logic analyzer.
My first test was using PB0/EXTI0 as a wakeup like and connecting external 3V3 to the pin. Everything works in this mode (although there seems to be some extra latency between signal change and EXTI generating the wakeup interrupt (3-4 SYSCLKs)). Using SLEEPONEXIT in this mode was disappointing and only yielded a reduction of 1 SYSCLK compared to WFI-loop and no SLEEPONEXIT. The inverval of ISRs is such that the previous ISR ends and there's quite a bit of time before the next one triggers.
Thinking that perhaps EXTI has some additional limitations, I next made a scenario where LPUART1 is configured to use lowest possible divisor (resulting fck/3 baudrate) and running off SYSCLK. Only TX is enabled, with TC-interrupt. The ISR code will toggle a GPIO, push a byte to TDR and toggle a GPIO again. While everything else worked without surprises, the SLEEPONEXIT again disappointed by not yielding any latency savings. Here again, the interval between ISRs is such that it will previous ISR will complete before the next triggers.
Is there a known deficiency in the implementation tail-chaining for L011-series or have I misunderstood the multiple sources on SLEEPONEXIT that tail-chaining isn't actually activated? Or are there some additional conditions that need to be true in order to activate it? Or perhaps this optimization is not available on CM0+ or on L011?
I will next attempt to replicate the setup on Nucleo-F070RB (CM0) (hoping that the hardware isn't too different) and if that also fails, will replicate the setup on KW41Z (the only other CM0+ that I have available).
Suggestions appreciated and thanks for reading :-)
EDIT: Found an Nucleo-L073RZ also, and it has same issue (although I would imagine the core to be close to identical to what is used in L011 wrt to SLEEPONEXIT).