cancel
Showing results for 
Search instead for 
Did you mean: 

How atomic are the STRD/LDRD instructions on Cortex-M4?

TDK
Super User

The STRD/LDRD instructions store and load a 64-bit value. Internally, they are broken up into 2x 32-bit accesses.

Screenshot 2026-03-08 113823.png

Since the element size is a word and not a double-word, they are not atomic in the same way that a uint32_t is atomic.

 

Questions are:

Is this instruction interruptible between the first and second 32-bit accesses?

If the instruction does get interrupted, does it restart or resume? If it restarts, it will re-write the first word, for a total of 3x 32-bit accesses. If it resumes, it will only write the last word.

The ARM reference manual has this to say on the subject:

Screenshot 2026-03-08 114246.png

https://developer.arm.com/documentation/ddi0403/latest

To me, it seems they are restarted based on this text, but the two "might" words here leave ambiguity.

 

But let's not rely on manuals or what people say on the internet. What actually happens in the hardware?

If you feel a post has answered your question, please click "Accept as Solution".
1 ACCEPTED SOLUTION

Accepted Solutions
TDK
Super User

Short answer:

We find that STRD is interruptible mid-instruction and is restarted rather than resumed when the interrupt returns.

 

Evidence:

To test, let's make a program and see what happens. The test program uses TIM1 as the interrupt source and interrupts execution every 40ish cycles, which get varied a bit in the ISR. This is on an STM32F411RE.

In the main loop, we modify a uint64_t with LDRD/STRD commands so that each half increments by one. The disassembly shows the compiler uses LDRD/STRD instructions for this.

Screenshot 2026-03-08 121538.png

In the interrupt handler, we check to see if the two halves are ever out of sync. If they are, save save the value and then reset it to 0 to see if both halves will be re-written (instruction is restarted) or if only the second half is (instruction is resumed). This is then detected in the main loop and the programs stops.


So what happens?

The interrupt handler is called between the two 32-bit writes and detects an intermediate state on the uint64_t. When this happens, the first uint32_t is updated while the second still has the old value, so they differ by 1. This means STRD is interruptible mid-instruction.

After returning from this intermediate state, the main thread re-writes both halves of the uint64_t. This means STRD is restarted rather than resumed when it gets interrupted.


Here's the interrupt handler:

void TIM1_UP_TIM10_IRQHandler(void) {
    if (TIM1->SR & TIM_SR_UIF) {
        TIM1->SR = ~TIM_SR_UIF;
        // vary reload by up to 15 cycles
        TIM1->ARR = (TIM1->ARR & 0xFFF0U) + (((TIM1->ARR & 0xFU) + 1) % 16);
        uint64_t x = gData2;
        if (x >> 32 != (uint32_t)x) {
        gMismatch = x;
        gData2 = 0;
        }
    }
}

 

Here's the relevant snippet in main.c:

const uint64_t gDelta = 0x0000000100000001U;
volatile uint64_t gData2;
volatile uint64_t gMismatch;

...

  	volatile uint64_t before = gData2;
  	UNUSED(before);
  	gData2 += gDelta;
  	if (gMismatch) {
  		volatile uint64_t after = gData2;
  		UNUSED(after);
  		while (1);
  	}

 

Here's the Expressions tab showing the result:

Screenshot 2026-03-08 121831.png

If you feel a post has answered your question, please click "Accept as Solution".

View solution in original post

3 REPLIES 3
TDK
Super User

Short answer:

We find that STRD is interruptible mid-instruction and is restarted rather than resumed when the interrupt returns.

 

Evidence:

To test, let's make a program and see what happens. The test program uses TIM1 as the interrupt source and interrupts execution every 40ish cycles, which get varied a bit in the ISR. This is on an STM32F411RE.

In the main loop, we modify a uint64_t with LDRD/STRD commands so that each half increments by one. The disassembly shows the compiler uses LDRD/STRD instructions for this.

Screenshot 2026-03-08 121538.png

In the interrupt handler, we check to see if the two halves are ever out of sync. If they are, save save the value and then reset it to 0 to see if both halves will be re-written (instruction is restarted) or if only the second half is (instruction is resumed). This is then detected in the main loop and the programs stops.


So what happens?

The interrupt handler is called between the two 32-bit writes and detects an intermediate state on the uint64_t. When this happens, the first uint32_t is updated while the second still has the old value, so they differ by 1. This means STRD is interruptible mid-instruction.

After returning from this intermediate state, the main thread re-writes both halves of the uint64_t. This means STRD is restarted rather than resumed when it gets interrupted.


Here's the interrupt handler:

void TIM1_UP_TIM10_IRQHandler(void) {
    if (TIM1->SR & TIM_SR_UIF) {
        TIM1->SR = ~TIM_SR_UIF;
        // vary reload by up to 15 cycles
        TIM1->ARR = (TIM1->ARR & 0xFFF0U) + (((TIM1->ARR & 0xFU) + 1) % 16);
        uint64_t x = gData2;
        if (x >> 32 != (uint32_t)x) {
        gMismatch = x;
        gData2 = 0;
        }
    }
}

 

Here's the relevant snippet in main.c:

const uint64_t gDelta = 0x0000000100000001U;
volatile uint64_t gData2;
volatile uint64_t gMismatch;

...

  	volatile uint64_t before = gData2;
  	UNUSED(before);
  	gData2 += gDelta;
  	if (gMismatch) {
  		volatile uint64_t after = gData2;
  		UNUSED(after);
  		while (1);
  	}

 

Here's the Expressions tab showing the result:

Screenshot 2026-03-08 121831.png

If you feel a post has answered your question, please click "Accept as Solution".
gbm
Principal

Well done! Please mark the solution for yourself. :)

Surely it could not be continued/resumed - no flags present for that. I am surprised it's aborted and restarted.

Another important feature of LDRD/STRD is the alignment requirement. I run into that problem once - that's why I remembered that these instructions exist in v7-M.

My STM32 stuff on github - compact USB device stack and more: https://github.com/gbm-ii/gbmUSBdevice
Pavel A.
Super User

Just another curious moment to it. If you look how u64 type is supported in GNU C++ with stdatomic (ST toolchain 12.3 with g++  v.12.3.1 20230626):  ATOMIC_LLONG_LOCK_FREE is defined there as 1 (means: "sometimes lock-free") and ATOMIC_LONG_LOCK_FREE defined as 2 ("always lock-free").

The following compiles but fails to link:

#include <stdatomic.h>
#include <cstdint>

extern "C"
void test_cpp()
{
#if ATOMIC_LONG_LOCK_FREE
   std::atomic<int32_t> a32{0}
   a32++; // ok
#endif

#if ATOMIC_LLONG_LOCK_FREE
   std::atomic<int64_t> a64{42}
   a64++; // unresolved lib function!
#endif
}

because of unresolved  reference to `__atomic_fetch_add_8'

So... the bad news is, GCC doesn't miraculously make u64 ops atomic for interrupts.

The good news is that the compiler lets you know that 64-bit types on this arch are less atomic than 32-bit or less at compile time, and we even can provide implementation of __atomic_fetch_add_8 that disables interrupts or whatever.