Instruction prefetch and cache in STM32F7

GGran.2 · ‎2024-10-29

Hello,

in the STM32F7 it is possible to enable instruction prefetch and in the same time disable ICache. Then I suppose that the prefetching does not put the prefetched instruction in cache. Is it correct?

Thank you

FBL · ‎2024-10-29

Hi @GGran.2

The instruction prefetch and the instruction cache are separate features, and they can be controlled independently.

The prefetch buffer fetches instructions from the flash memory before they are needed by the CPU. This helps to reduce the latency associated with fetching instructions from flash memory, especially when the CPU is running at high speeds.

However, the instruction cache stores recently fetched instructions so that if the CPU needs to execute the same instructions again, it can retrieve them quickly from the cache instead of fetching them from the flash memory again.

Enabling prefetching does not directly place prefetched instructions into the instruction cache.

To give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.

View solution in original post

waclawek.jan · ‎2024-10-29

The prefetch is part of ART which is on the (I)TCM bus, not the AXI matrix where the L1 cache is located.

The cache subsystem performs speculative fetches, too, akin to prefetch; this is not particularly well documented. Neither is arbitration within the FLASH interface documented; I would presume TCM accesses take precedence over any other access.

JW

FBL · ‎2024-10-29

Hi @GGran.2

The instruction prefetch and the instruction cache are separate features, and they can be controlled independently.

The prefetch buffer fetches instructions from the flash memory before they are needed by the CPU. This helps to reduce the latency associated with fetching instructions from flash memory, especially when the CPU is running at high speeds.

However, the instruction cache stores recently fetched instructions so that if the CPU needs to execute the same instructions again, it can retrieve them quickly from the cache instead of fetching them from the flash memory again.

Enabling prefetching does not directly place prefetched instructions into the instruction cache.

To give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.

GGran.2 · ‎2024-10-29

ART makes a prefetch, but also the Cortex-M7 without ART makes the prefetch if it is enabled. In fact I can disable ART and enable the prefetch. In the document https://www.st.com/content/ccc/resource/training/technical/product_training/group0/8d/11/c5/6d/be/15/4f/63/STM32H7-System-ARM_Cortex_M7_M7/files/STM32H7-System-ARM_Cortex_M7_M7.pdf/_jcr_content/translations/en.STM32H7-System-ARM_Cortex_M7_M7.pdf at page 4 there is a figure in which the prefetch unit is linked to I-Cache.

GGran.2 · ‎2024-10-29

Thank you!

waclawek.jan · ‎2024-10-29

@FBL,

Here I would like to call for being a bit more meticulous in the formulations.

There are two caches in this regard: one is the 64 lines of ART, which is on the ITCM feed from FLASH; and other is the L1 cache on the AXI-processor interface. It's the latter which is usually called ICache and that's probably what @GGran.2 had in mind when writing "in the same time disable ICache".

Prefetch, as being explicitly enabled by FLASH_ACR.PRFTEN, is part of ART and thus on ITCM. Lines fetched by the prefetch may end up in ART, but only when actually fetched by the processor - that's probably what you are talking about above. One extra note, it's not only *instructions* which are fetched thorugh ITCM, but also *data literals*, and both end up in ART if actually read by processor.

As ART is on the ITCM, lines prefetched by the prefetch in ART can't end up in the L1 cache as they don't reach/travese the AXI matrix.

However, the L1 cache itself can perform speculative reads, which from "traffic" point of view is the same thing as prefetch, i.e. accesses FLASH without having an acute need for the fetched line. It does this in order to actually store that line into the cache and does so without the processor actually reading it, in the anticipation that it may do so in the future and that then might spare the waitstates associated with FLASH and any other latencies across the AXI matrix. It does not do this if it's disabled, of course.

JW

GGran.2 · ‎2024-10-29

@waclawek.jan thank you for the explanation. You wrote that prefetch is part of ART then on ITCM. But I can enable prefetch also if I set AXI as interface.

GGran.2 · ‎2024-10-29

Besides I can enable speculation with L1 ICache disabled, then I suppose that if both speculation and ICache are enabled, the fetched instruction by speculation are put in ICache, if the speculation is enabled but ICache is disabled the speculation is permitted but the fetched instruction is not put in the cache.

waclawek.jan · ‎2024-10-29

> But I can enable prefetch also if I set AXI as interface.

What do you mean by "set AXI as interface", how do you do that? And how do you "enable prefetch"?

Unfortunately, there are several mechanisms working concurrently, mutually interacting, and with mostly undocumented. For example, the processor itself performs prefetch into a 4x64-bit prefetch queue, and it's not clear, whether that prefetch counts in ART as "consumption", or not (in which case there would need to be some dedicated signalling from processor to ART, which I doubt there is).

To be honest, I am not much interested in the Cortex-M7 features other than of pure curiosity, I don't use them, and would I ever be forced to use them, I'd use them from an explicit standpoint that they may be fast number crunchers but are generally unsuitable for any close-to-cycle-precise control work.

JW