Associate

Solved

Why do some misaligned operands require more memory cycles than expected to read from the memory of a Cortex-M4?

Forum|Forum|5 years ago
January 26, 2021
5 replies
1596 views

I wrote some software that reliably measures the number of memory cycles required to retrieve 16 and 32-bit operands from the Cortex-M4 memory. When a 16-bit operand fits within a 32-bit physical word, I would have expected that only a single memory cycle would be necessary to read or write the data, but that's apparently not the case if it's in the MIDDLE of a 32-bit physical word. And when a 32-bit operand is stored at an address off by 1 or 3 from being word-aligned, it takes THREE cycles (instead of the expected two). Take a look at the attached summary of results. Can anyone explain why the results are not as expected (as shown in red)?

Thanks!

Dan

Memory-Cycles.pdf

This topic has been closed for replies.

Best answer by waclawek.jan

Halfword transfers on AHB must be halfword-aligned (i.e. have even addresses), thus the "middle" example is split to two byte-wide accesses. This rule simplifies logic of attached peripherals.

The word-unaligned access in second example is split into one byte, one halfword and one byte.

JW

Radosław

Associate II

Ponieważ prawidłowe ustawienie odbywa się za pośrednictwem magistrali, a nie rdzenia i nadal wymaga czasu.

waclawek.janBest answer

Super User

Halfword transfers on AHB must be halfword-aligned (i.e. have even addresses), thus the "middle" example is split to two byte-wide accesses. This rule simplifies logic of attached peripherals.

The word-unaligned access in second example is split into one byte, one halfword and one byte.

JW

KnarfB

Super User

Your measurements are consistent with the TRM https://developer.arm.com/documentation/100166/0001/Programmers-Model/Instruction-set-summary/Load-store-timings. It's not only the memory accesses that count, but there seems to be one additional cycle used by a shifter when the least address bit is 1. Those are, in general, decisions in the micro architecture. There is a trade-off between smaller chip size (lower gate count) and higher performance (lower cycle count).

Don't know why this decision was made, but note that other MCUs do not support misaligned accesses at all. The compiler can help to avoid unaligned accesses, so there is no strict need for it at all.

hth

KnarfB

waclawek.jan

Super User

> It's not only the memory accesses that count, but there seems to be one additional cycle used by a shifter when the least address bit is 1.

No - with the misaligned halfword access, the basic timing is 2 cycles, corresponding to 2 memory accesses (byte-wide each).

Of course, waitstates on memory accesses, buffering at various points, arbitration, resynchronization - all these things may add up cycles.

> Don't know why this decision was made,

Alignment requirements significantly simplify logic at the "slave" side; OTOH, allowing unaligned accesses by an attachment on the processor's bus interface (this is significant, other busmasters e.g. DMA still don't allow unaligned accesses) is a blessing for things like building packets into communication buffers.

JW

DLewiAuthor

Associate

Wow! I must have really found the right place to ask the question! Thanks to all of you for your feedback. That explains it completely.

Best,

Dan

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded