Can I subvert the context switching for STM32 interrupt handling?

JCase.1 · ‎2022-10-06

The STM32 family has fantastic interrupt service, they stack a whole slew of extra registers for you, and load the LR with an artificial return to properly unstack while looking for opportunities for tail chaining, aborted entry, etc etc. HOWEVER....it is too damn slow. I am finding (STM32F730Z8, 200 MHz clock, all code including handlers in ITCM, everything in GNU assembly) that it takes about 120-150 ns overhead to get into an interrupt.

I am still learning about these, used to the old ARM7 where you had to do it all yourself, however, in those chips, if you had a minimal handler you didn't need to stack much. So -- can I "subvert" the context switching in hardware, and just have it leap to the handler at elevated priority, pausing only to fill the pipeline, and leaving me to take care of stacking what is needed? I don't think so, and haven't seen a way to do it, but I'm working on an extremely tight time-sensitive realtime code, and interrupt switching is eating all my time budget. I'm reverting to doing it all in low-code, polled, but I hate the jitter that gives me on response to pin edges. Help?

(cross posted to stack overflow)

waclawek.jan · ‎2022-10-07

The ISR entry sequence takes 12 cycles, plus whatever latencies the interrupted process (multicycle instructions, explicitly disabled interrupt or interrupt with higher priority running, etc) impose, plus latencies of all memories and buses which are involved.

> it takes about 120-150 ns overhead to get into an interrupt.

How do you know? How do you observe that, exactly?

If the 200MHz system clock is true (did you confirm that? how?) 24-30 cycles sounds a tad bit more than that, but still not unimaginable.

Where is the vector table? What's VTOR setting? Where is the stack?

Generally, you should not be concerned about ISR latency, as you should strive to handle as much timing-sensitive issues with hardware as possible. And hardware these mcus do have plenty.

JW

JCase.1 · ‎2022-10-07

How do I observe? I poke unused pins at entry and exit of interrupt handler, (and at crucial points in non-interrupt code) and look at them on a scope.

ISR is ***supposed*** to take 12 cycles...that is what I read, and what I planned on when writing the code, but not what I see. I think that there is extra "junk" happening in microcode setting up the context saving..... I'm seeing well over twice the delay that I should according to the specs.

200 MHz? That is what I set up with the PLL clocking, and that is what is confirmed by running any of the clocks directly to output pins.

I copy all code (< 16 kB) from flash down to 0x00000000 right after boot, and then I adjust PC to suit. The increase of speed is verified by some simple fiducial (pins written up/down) verify. Not rocket surgery. VTOR is set to 0x00000000 Stack is at the top of DTCM. Each of these steps is verified to produce the expected speed-up (verified on scope) when implemented....not likely to be a failure to run at 200 MHz instructions w/ zero wait states.

I welcome your advice as to how to move more of the computational burden to hardware. (I've already changed from interrupt on both edges to interrupt on falling edge only, and latched the key up/down flag on the rising edge in a nearby CPLD. I'm running out of options.)

Thanks!

Jeff