cancel
Showing results for 
Search instead for 
Did you mean: 

Why do some misaligned operands require more memory cycles than expected to read from the memory of a Cortex-M4?

DLewi
Associate II

I wrote some software that reliably measures the number of memory cycles required to retrieve 16 and 32-bit operands from the Cortex-M4 memory. When a 16-bit operand fits within a 32-bit physical word, I would have expected that only a single memory cycle would be necessary to read or write the data, but that's apparently not the case if it's in the MIDDLE of a 32-bit physical word. And when a 32-bit operand is stored at an address off by 1 or 3 from being word-aligned, it takes THREE cycles (instead of the expected two). Take a look at the attached summary of results. Can anyone explain why the results are not as expected (as shown in red)?

Thanks!

Dan

1 ACCEPTED SOLUTION

Accepted Solutions

Halfword transfers on AHB must be halfword-aligned (i.e. have even addresses), thus the "middle" example is split to two byte-wide accesses. This rule simplifies logic of attached peripherals.

The word-unaligned access in second example is split into one byte, one halfword and one byte.

JW

View solution in original post

5 REPLIES 5
Radosław
Senior II

Ponieważ prawidłowe ustawienie odbywa się za pośrednictwem magistrali, a nie rdzenia i nadal wymaga czasu.

Halfword transfers on AHB must be halfword-aligned (i.e. have even addresses), thus the "middle" example is split to two byte-wide accesses. This rule simplifies logic of attached peripherals.

The word-unaligned access in second example is split into one byte, one halfword and one byte.

JW

KnarfB
Principal III

Your measurements are consistent with the TRM https://developer.arm.com/documentation/100166/0001/Programmers-Model/Instruction-set-summary/Load-store-timings. It's not only the memory accesses that count, but there seems to be one additional cycle used by a shifter when the least address bit is 1. Those are, in general, decisions in the micro architecture. There is a trade-off between smaller chip size (lower gate count) and higher performance (lower cycle count).

Don't know why this decision was made, but note that other MCUs do not support misaligned accesses at all. The compiler can help to avoid unaligned accesses, so there is no strict need for it at all.

hth

KnarfB

> It's not only the memory accesses that count, but there seems to be one additional cycle used by a shifter when the least address bit is 1.

No - with the misaligned halfword access, the basic timing is 2 cycles, corresponding to 2 memory accesses (byte-wide each).

Of course, waitstates on memory accesses, buffering at various points, arbitration, resynchronization - all these things may add up cycles.

> Don't know why this decision was made,

Alignment requirements significantly simplify logic at the "slave" side; OTOH, allowing unaligned accesses by an attachment on the processor's bus interface (this is significant, other busmasters e.g. DMA still don't allow unaligned accesses) is a blessing for things like building packets into communication buffers.

JW

DLewi
Associate II

Wow! I must have really found the right place to ask the question! Thanks to all of you for your feedback. That explains it completely.

Best,

Dan