2014-01-10 08:10 AM
Hello. I am looking at
https://github.com/mabl/ChibiOS/blob/master/os/ports/GCC/ARMCMx/STM32F4xx/ld/STM32F405xG.ld
, one of the GNU linker scripts for the ChibiOS project. I understand this script for the most part but I am very confused by the alignment of the sections. Namely:Thanks for the help!
Sam #stm32f405-linker-scatter-gnu-ld2014-01-13 05:54 PM
Implementation details are hard to come by, one could use the core's cycle counter to make reasonable guesses of how ART functions.
The design of the ART isn't going to have an instruction spanning issue, the 32-bit the core reads will always fall on a 32-bit boundary, and the core will assemble the stream internally, and prefetch as it goes. One easy optimization for the flash would be to rack up the next read, but this may well not even get into the ART cache, but it could be ready if required. If it gets into the ART cache it's going to evict something else. For something that's strapped outside the core, it does operate quite efficiently. As I recall the tests I did suggest it is faster than executing out of RAM, but less predictable.2014-01-14 02:18 AM
Hi
'' How does the core decode the combination of 16 and 32 bit instructions?'' Technically, ARM cortex M4 are RISC processors, usually meaning that all instructions must be the same width. HOWEVER, ARM to all compatibility between their different variants have sub instructions sets - often referred to as 'Thumb' I think there is a Thumb2 as well. The Thumb instruction set is only 16 bits wide. Mots tool chains (eg IAR and Atollic) allow you to set an option on use the native or the Thumb instruction set. ''What happens if it tries to load a 16-bit and half of a 32-bit instruction?'' I do NOT think the ARM can mix instruction set widths - they are all 16 bits or 32 bits (could be wrong though - I rarely look in that much detail) Sadly, this further compilates the issue for you but you did ask about 16 bit wide instructions. As Clive1 points out, the cache will load '1 width' (128bits) into the cache. This may (or may not) help with execution speed. It depends on what is going to happen next. ChilibOS by forcing the start of instructions on 128 bit boundary helps to work within the STM32 architecture. This does no guarantee it will be faster than another OS, just give it a better chance.