Skip to main content
DLewi
Associate
January 26, 2021
Solved

Why do some misaligned operands require more memory cycles than expected to read from the memory of a Cortex-M4?

  • January 26, 2021
  • 5 replies
  • 1596 views

I wrote some software that reliably measures the number of memory cycles required to retrieve 16 and 32-bit operands from the Cortex-M4 memory. When a 16-bit operand fits within a 32-bit physical word, I would have expected that only a single memory cycle would be necessary to read or write the data, but that's apparently not the case if it's in the MIDDLE of a 32-bit physical word. And when a 32-bit operand is stored at an address off by 1 or 3 from being word-aligned, it takes THREE cycles (instead of the expected two). Take a look at the attached summary of results. Can anyone explain why the results are not as expected (as shown in red)?

Thanks!

Dan

    This topic has been closed for replies.
    Best answer by waclawek.jan

    Halfword transfers on AHB must be halfword-aligned (i.e. have even addresses), thus the "middle" example is split to two byte-wide accesses. This rule simplifies logic of attached peripherals.

    The word-unaligned access in second example is split into one byte, one halfword and one byte.

    JW

    5 replies

    Radosław
    Associate II
    January 26, 2021

    Ponieważ prawidłowe ustawienie odbywa się za pośrednictwem magistrali, a nie rdzenia i nadal wymaga czasu.

    waclawek.jan
    waclawek.janBest answer
    Super User
    January 26, 2021

    Halfword transfers on AHB must be halfword-aligned (i.e. have even addresses), thus the "middle" example is split to two byte-wide accesses. This rule simplifies logic of attached peripherals.

    The word-unaligned access in second example is split into one byte, one halfword and one byte.

    JW

    KnarfB
    Super User
    January 26, 2021

    Your measurements are consistent with the TRM https://developer.arm.com/documentation/100166/0001/Programmers-Model/Instruction-set-summary/Load-store-timings. It's not only the memory accesses that count, but there seems to be one additional cycle used by a shifter when the least address bit is 1. Those are, in general, decisions in the micro architecture. There is a trade-off between smaller chip size (lower gate count) and higher performance (lower cycle count).

    Don't know why this decision was made, but note that other MCUs do not support misaligned accesses at all. The compiler can help to avoid unaligned accesses, so there is no strict need for it at all.

    hth

    KnarfB

    waclawek.jan
    Super User
    January 26, 2021

    > It's not only the memory accesses that count, but there seems to be one additional cycle used by a shifter when the least address bit is 1.

    No - with the misaligned halfword access, the basic timing is 2 cycles, corresponding to 2 memory accesses (byte-wide each).

    Of course, waitstates on memory accesses, buffering at various points, arbitration, resynchronization - all these things may add up cycles.

    > Don't know why this decision was made,

    Alignment requirements significantly simplify logic at the "slave" side; OTOH, allowing unaligned accesses by an attachment on the processor's bus interface (this is significant, other busmasters e.g. DMA still don't allow unaligned accesses) is a blessing for things like building packets into communication buffers.

    JW

    DLewi
    DLewiAuthor
    Associate
    January 26, 2021

    Wow! I must have really found the right place to ask the question! Thanks to all of you for your feedback. That explains it completely.

    Best,

    Dan