Memory/Instruction barriers before writing to the backup SRAM

lutztonineubert · ‎2020-08-05

The following code didn't worked because of a missing data barrier before writing to the backup SRAM:

HAL_PWR_EnableBkUpAccess();
std::copy(buffer, buffer + num_bytes, BaseAddress + address);
HAL_PWR_DisableBkUpAccess();

Adding a DSB, solved the issue for now.

HAL_PWR_EnableBkUpAccess();
__DSB();
std::copy(buffer, buffer + num_bytes, BaseAddress + address);
HAL_PWR_DisableBkUpAccess();

It is understandable, that the enable BkUp (which is setting a single bit) needs to be fully completed before writing the the actually memory addresses.

But for me it is not fully understandable, if a DMB would be enough (it also works) and if I need an additional ISB, so Enable and Disable don't happen just before the actual copy, like so:

HAL_PWR_EnableBkUpAccess();
__DSB();
std::copy(buffer, buffer + num_bytes, BaseAddress + address);
__ISB();
HAL_PWR_DisableBkUpAccess();

Can someone help me out here, what the correct way would be?

waclawek.jan · ‎2020-08-05

> Are you sure that the CPU cannot see the completion of access

This will be an exercise in modal verbs... It *may* see them.

First, as you all are already painfully aware, these are SoC rather than microcontrollers, i.e. not peripherals tightly integrated around a core sharing clock, but IPs slapped to the core through the bus fabric with transactions based on handshakes. If a peripheral can't store/retrieve data immediately, it signals WAIT, this is propagated back to the originator. To avoid slowdowns from slower peripherals/sub-buses (e.g. APB), buffers (FIFOs, usually one-transaction) are inserted. WAIT is then propagated back through buffer only if it is full.

The barriers outwardly wait until all WAITs cease (they perform some tasks internally to processor and its nearest kin, too). If a write still sits in a buffer, the processor does know about it. There may be some extra "are all buffers empty?" signal from the processor, but I doubt there is any.

So, barriers on straightforward memories will work. On APB buses probably won't. On the intermatrix interconnects in H7, I don't know and don't care.

JW

Pavel A. · ‎2020-08-09

Ok, then a small concrete question, if I may:

Will write then read sequence on the same address, properly aligned, work "correctly" no matter what is bus matrix or other mentioned things? That's, the write is guaranteed to complete by its target before the CPU gets the read value (with all the needed waits) ?

-- pa

Piranha · ‎2020-08-09

ARM architectural requirements:

For device and strongly-ordered memory types - yes. For normal memory type - no, DMB between write-read is required.

Cortex-M implementation details:

At the moment of writing, except for Cortex-M7 and the upcoming Cortex-M55, all other Cortex-M cores do not have a capability of instruction reordering and will work correctly on all memory types even without memory barrier instructions.

Also remember that those accesses both need to be volatile or must have compiler barrier between them for the respective instructions to be compiled in an order in which they are written in code.

Pavel A. · ‎2020-08-10

Thank you @Piranha I'm going to RTFM... but if this is true about the normal memory, it would break most "normal" programs.

Angry users would come to inventors of such architecture with torches and axes.

Maybe this requirement of a barrier between write an read applies only when the memory is not actually "normal" though the processor thinks so?

(As Jan wrote about the backup SRAM of H7 - it is mapped as "normal" but actually is not?)

-- pa

waclawek.jan · ‎2020-08-10

> if this is true about the normal memory,

>it would break most "normal" programs.

Why do you think so?

The sole purpose of the program in mcu is to perform accesses to the peripherals. These have to happen in the order as they are written in the program - and this indeed is ensured, at the compiler level by qualifying the registers as volatile, at the processor level by having them located as Device. Everything else has effect only on the timing. Allowing reordering and buffering/caching speeds up execution, and that's what everybody wants.

Note, that even if a value written by program to a Normal memory is not physically stored to the memory, the execution is still correct. Either the processor can infer from the flow that the written value is not yet needed (because it's not read, and if the same address is written in program again, the old value may be safely forgotten without being written ever), so it can delay its writing to some suitable later time; or it can serve the value to a read from a cache or a buffer at the processor boundary. That is, provided that the variable in question has not been eliminated altogether already when compiling.

The backup SRAM is sort of a crossover between a peripheral and memory. The default mapping in H7 puts it at Normal (and cached), which may be quite OK, once you make the"special procedure" to unlock it after reset, and perhaps ensure proper writeback in an "early powerdown warning" interrupt. Or, you can remap it using the MPU. Or, you can avoid using the F7/H7...

JW

Pavel A. · ‎2020-08-11

> Why do you think so?

Because most normal programs just write and read from a normal memory without any explicit barriers.

Any decent MCU must have at least some amount of normal memory (stack..)

/* Yes, there is speculative execution and other such things ... the Intel folks thought they are so smart and can get away with it, but it ended badly */

> Or, you can avoid using the F7/H7

Sorry, cannot. Must cope with what the customer wants ;)

-- pa

waclawek.jan · ‎2020-08-11

> speculative execution... but it ended badly

You are using it every day. F7/H7 does it too. There is no more Moore.

JW

nickbeth · ‎2024-08-19

Why can we remove the last one? Isn't it needed to ensure the backup region can't be written to?

Imagine that, for some reason, we do this:

HAL_PWR_EnableBkUpAccess();
__DMB();
std::copy(buffer, buffer + num_bytes, BaseAddress + address);
__DMB();
HAL_PWR_DisableBkUpAccess();
std::copy(buffer, buffer + num_bytes, BaseAddress + address);

Wouldn't this allow the backup SRAM to write some bytes before disabling writes? Wouldn't another DMB be necessary between the disable call and the last copy?

waclawek.jan · ‎2024-08-19

@nickbeth,

In context of 'F4 (or other STM32 with similarly connected PWR and BKPSRAM), yes, you may need to have a delay (as I wrote above, the barrier instruction there functions only as a delay, and it may or may not be sufficient depending on the particular APB divider and maybe also some other circumstances) after disabling backup domain access in PWR, if you anticipate rouge accesses to BKPSRAM may follow immediately after that disable.

But usually you write your code so that such rouge accesses are possible only through some inadvertent action later in the code, i.e. that there's enough delay in the "proper" code following the PWR_CR.DBP clear.

JW