2022-09-30 02:46 AM
I'm unable to find any information on the STM32H7 instruction prefetch behavior (from QSPI/octospi).
My usecase/problem: I have a piece of startup code (executed from external QSPI) that moves the main application into (executable) RAM (this is basically one loop containing like 3 instructions).
Depending on the alignment of my loop/code, the prefetching "breaks" and I observe instructions fetches during every loop iteration basically; this completely ruins performance (it is especially sensitive because the instruction prefetching competes with normal data access here).
What do I need to do to prevent this reliably (or how/where do I find out)?
From my testing so far, aligning the first loop instruction by padding with NOPs solves this, but what alignment do I need exactly (I assume that the loop body must NOT cross certain alignment boundaries?).