Memory/Instruction barriers before writing to the backup SRAM

lutztonineubert · ‎2020-08-05

The following code didn't worked because of a missing data barrier before writing to the backup SRAM:

HAL_PWR_EnableBkUpAccess();
std::copy(buffer, buffer + num_bytes, BaseAddress + address);
HAL_PWR_DisableBkUpAccess();

Adding a DSB, solved the issue for now.

HAL_PWR_EnableBkUpAccess();
__DSB();
std::copy(buffer, buffer + num_bytes, BaseAddress + address);
HAL_PWR_DisableBkUpAccess();

It is understandable, that the enable BkUp (which is setting a single bit) needs to be fully completed before writing the the actually memory addresses.

But for me it is not fully understandable, if a DMB would be enough (it also works) and if I need an additional ISB, so Enable and Disable don't happen just before the actual copy, like so:

HAL_PWR_EnableBkUpAccess();
__DSB();
std::copy(buffer, buffer + num_bytes, BaseAddress + address);
__ISB();
HAL_PWR_DisableBkUpAccess();

Can someone help me out here, what the correct way would be?

Pavel A. · ‎2020-08-05

+1 Read-back is intuitive and portable way to ensure flushing of writes and delays, at once.

Yes, H7 has it's own can of worms...

Piranha · ‎2020-08-05

Indeed! But at least Cortex-M7 still needs this to be combined also with both (before and after SRAM access) memory barriers.

lutztonineubert · ‎2020-08-05

So read-back or DMB? Or is both valid? :(

waclawek.jan · ‎2020-08-05

As I wrote above: DMB/DSB/whatever in 'F4 acts only as a delay and may not be sufficient.

JW

waclawek.jan · ‎2020-08-05

Yes, the CM7 are different, definitively. Wouldn't caching play some detrimental role there, too? I am also not sure whether there is some potential issue with different bus clocks there, especially in 'H7.

JW

Piranha · ‎2020-08-05

Jan's answer is the correct one. Set it as the best to not misguide other people. :)

Piranha · ‎2020-08-05

Yes, except for DTCM, D-cache management can be necessary and those SCB_***() functions include memory barriers also. And then it all can be changed in different ways with MPU, one can even set memory to a strongly ordered type which should wait for bus access to complete. :D

Piranha · ‎2020-08-05

All of this raises the question about AHB buses - does these guarantee that a write with memory barrier is completed over the bus or can there be some delays also? What about synchronization of AXI and AHB with same (F7) and different (H7) frequencies?

@Amel NASRI , @Imen DAHMEN , or someone from ST - can someone finally comment/solve the long-standing mystery of bus synchronization and delays?

waclawek.jan · ‎2020-08-05

The processor - and its facilities - does not "see" beyond its boundaries. In other words, all barriers etc. act only upon the processor and the attached write buffer (that seems to include the bitbanding attachment in case of CM3/CM4 - I have a fun story with that one on the NXP LPC17xx, where GPIO is in bit-bandable *memory* (thus normal) area).

In case of CM7, probably some or all the AXIM stuff, I'm not sure - as I've said, I am not interested - exactly because of the complexity, I work more at the "control" side so I give up processing power in favour of control.

In other words, whatever is beyond the busmatrix, is not controlled by processor, and may and does involve various timing issues. The biggest fun is with inter-bus inter-module interconnections. It's ST which is supposed to describe it. I understand it's a hard task, OTOH, they solve it generally by massive handwavings. (Not that other manufacturers are better, but that's no argument of course)

JW

Piranha · ‎2020-08-05

Are you sure that the CPU cannot see the completion of access even over strongly ordered memory type?

From AN 4838:

Strongly ordered memory: everything is always done in the programmatically listed order, where the CPU waits

the end of load/store instruction execution (effective bus access) before executing the next instruction in the

program stream. This can cause a performance hit.