cancel
Showing results for 
Search instead for 
Did you mean: 

M7 Speculative access vs cache

Pavel A.
Evangelist III

Dear experts,

Still not sure after reading the fine manuals.

On CM7 (STM32H7, F7) if I and D caches have never been enabled after reset, can speculative data fetch and instruction fetch still occur?

Is speculative access independent from caching?

1 ACCEPTED SOLUTION

Accepted Solutions

I'm no expert so I won't answer but ask: Why do you ask?

The reason why I'm asking why do you asking is, that maybe you are not specifically after things which are called "speculative" by ARM.

Yes, if you disable caches, there will be no cache line fill/dump. However, acesses to memory area tagged Normal may still be merged/reordered and read accesses may still be of a different width than requested by processor (read, LDB into FMC on AXIM may result in 64-bit read, and FMC set to 8-bit will translate that to 8 reads). On a non-STM32 Cortex-M3-based mcu which had for inexplicable reasons GPIO allocated in Normal area, I've seen a bit-banded write merged with subsequent direct write into the same GPIO (read: my bit-banged SPI did not work).

Instruction fetches are a whole other world, and they are not supposed to go to anything ordering/width/whatever-sensitive, as there are prefetches, speculative branching, abandoned/restarted instructions, maybe I've forgotten something else.

JW

View solution in original post

8 REPLIES 8

I'm no expert so I won't answer but ask: Why do you ask?

The reason why I'm asking why do you asking is, that maybe you are not specifically after things which are called "speculative" by ARM.

Yes, if you disable caches, there will be no cache line fill/dump. However, acesses to memory area tagged Normal may still be merged/reordered and read accesses may still be of a different width than requested by processor (read, LDB into FMC on AXIM may result in 64-bit read, and FMC set to 8-bit will translate that to 8 reads). On a non-STM32 Cortex-M3-based mcu which had for inexplicable reasons GPIO allocated in Normal area, I've seen a bit-banded write merged with subsequent direct write into the same GPIO (read: my bit-banged SPI did not work).

Instruction fetches are a whole other world, and they are not supposed to go to anything ordering/width/whatever-sensitive, as there are prefetches, speculative branching, abandoned/restarted instructions, maybe I've forgotten something else.

JW

KnarfB
Principal III

No idea how do prove or disprove that for CM7 if it's not in the manuals. On Arm Cortex-A performance counters were used to monitor mispredicted branches etc.. Speculation is a feature of the microarchitecture and per se not directly related to the cache. Documents like https://developer.arm.com/documentation/ddi0489/f/memory-system/speculative-accesses also don't talk about such a dependeny.

So I would guess: yes, they can still occur.

just my two cents

KnarfB

Jan, thank you for your question )

I ask this because quite long ago ST released guidance about possible effects of speculative accesses in external memory zones (yes, FMC and so on), and they recommend to always use MPU to sanitize unused zones of address space.

I've shipped few projects for H7 without following that guidance (tested, of course...) but now customers want to add external memories on FMC and QSPI, or change the memories to different models. Sometimes they see weird bugs.

For one, whole MCU hangs while erasing a sector of NOR flash on FMC, when using certain library for the NOR flash, but not with another.

So you say things can be caused simply by issuing different width and reordering.

Reordering can be cured without MPU, by placing memory barriers. But not issuing different width? So will have to do as ST recommends...

Changing memory structure is IMO a big change enough, both electrically and in working of all the logic involved, so that preexisting bugs may be revealed, with no relationship to "non-rudimentary" accesses.

JW

Piranha
Chief II

> Speculation is a feature of the microarchitecture and per se not directly related to the cache.

This! I can add that a branch prediction and even dual issue execution can be disabled in Auxiliary Control Register, but not the speculative accesses.

By the way, ARM has announced Cortex-M85.

"The M85 is using dual instruction issue with selective triple issue with intelligent control for the branch prediction. This boosts the performance by 30% over the M7 core and 85% over the M55."

https://www.eenewseurope.com/en/arm-cortex-m85-in-voice-recognition-subsystem/

Triple issue... If ST will implement such a core in MCU, it means the Cube/HAL broken bloatware will be as bad as it is for the M7, but will fail even more often...

> dual issue execution can be disabled ...but not the speculative accesses

The bitter experience of Intel (Spectre, Meltdown) did not hint them...

> as bad as it is for the M7, but will fail even more often

Why? More load/store reordering?

> Spectre, Meltdown

Cortex-M7 was released in 2014 long before those vulnerabilities were discovered. But for 99% of microcontroller projects this class of vulnerabilities is not relevant, because they don't run externally installed application code, which could exploit such vulnerabilities.

> More load/store reordering

Yes. The chance that the CPU will find something that can be reordered or done simultaneously will be higher. By the way...

hDescRxSet_->Status = ETH_RDES0_OWN;
__DMB();
ETH->DMARPDR = (uint32_t)ETH;  // Any value issues a descriptor list poll demand.

Long time ago I had this code actually failing without the DMB instruction. As in my code the descriptor memory is of normal type, the CPU can reorder accesses to it. Without the DMB instruction at least sometimes it did write the ETH register before storing the OWN bit variable. As I had not learned the D-cache management at that time, it was happening on F7 with D-cache disabled.

Very useful for me. Thanks. Live and learn, as they say... /* and will die a fool*/