2014-11-25 09:55 AM
Dear All,
As we can see in the attached picture the ARM support Unaligned and Aligned data supportPlease advice what mechanism is using by arm to identify two or three or four operand at single memory address2014-11-25 09:59 AM
Please advice what mechanism is using by arm to identify two or three or four operand at single memory address.
A different memory address?2014-11-25 10:09 AM
This slide is garbage. What dumb processor is not able to read a byte from a location that is not word aligned ?
The teacher mixes unaligned read (read a word from not word-aligned location) and variables allocation.2014-11-25 10:11 AM
The 32-bit long takes 4 bytes, if the first has an aligned address of 0x20001000, the second is at 0x20001004
2014-11-25 01:21 PM
This slide doesn't capture the essence of what is meant by unaligned data support in the CM3 (or at least STM32)!
The long word (32 bits) can be placed at (addr % 4) or ((addr % 4) + 1) or ((addr % 4) + 2) etc. The only time that long words and words arerequired
to be aligned to their respective boundaries is when using the DMA functions and also when using the load /store multiple registers (possibly also push and pop instructions). The only downside of having unaligned data is that it requires multiple memory accesses to read / write the data, but for the most part the overhead is scarcely noticeable.The support for unaligned data was actually the cause of a bug in an older version of the GCC C++ library function for implicit memcpy, as it was using the load / store multiple registers to ''optimise'' the memcpy - which didn't work at all well when the items being copied where not aligned.2014-11-26 08:37 AM
2014-11-26 08:50 AM
Thumb instructions are mostly 16-bit, so PC += 2, the STM32 cannot run 32-bit ARM instructions.
2014-11-26 09:14 AM
> as well as same procedure is using Flash ram ? .
> I mean, Since PC( Program Counter ) is increment by 4 Program fetch is independent from FLASH - FLASH can be read/written as data memory and program can run from outside FLASH. Cortex-M runs exclusively in Thumb mode, i.e. instruction width is 16 bit (halfword; there are one-halfword and two-halfword instructions) and instructions have to be aligned to 16-bit boundary (i.e. instruction address LSB = 0). As a matter of fact, instructions *are* fetched in 32-bit chunks through the I-bus, but alignment is not required; however, if there are two-halfword instructions as the target of branch, there may be some speed penalty as two fetches may be needed. This is why some compilers with certain settings tend to align functions to 32-bits. For details, see the Prefetch Unit description in the Cortex-M3 Technical Reference Manual. JW2014-11-26 09:21 AM
Then you have flash lines on the F2/F4 being 128-bit wide, with the ART barrel shifting cached data into the prefetch port.
2014-11-26 01:43 PM
> Then you have flash lines on the F2/F4 being 128-bit wide, with the ART barrel shifting cached data into the prefetch port.
Indeed. So there might be an interesting speed penalty for rare (uncached) branches landing near the end of the line. There also might be a benefit from rare (uncached) short jumps being replaced by long ''ite'' (if-then-else) instruction sequences - both by benefiting from the prefetches and from avoiding unnecessary cache fill. It's quite hard to estimate the relative cost of these. I wonder whether the costly commercial compilers account for these effects. It's also a pity that the ST designers did not study the existing jumpcache designs - e.g. the 100MHz SiLabs '51 jumpcache has the possibility to lock cache lines - a minor (in transistor count) but potentially significant enhancement... JW