Resolved! Why do some misaligned operands require more memory cycles than expected to read from the memory of a Cortex-M4?
I wrote some software that reliably measures the number of memory cycles required to retrieve 16 and 32-bit operands from the Cortex-M4 memory. When a 16-bit operand fits within a 32-bit physical word, I would have expected that only a single memory ...