2017-12-24 10:54 AM
The manuals mention 'ART accelerator allowing 0-wait state execution'.
Do all STM32 models run off the internal flash at maximum speed? Would code copied to RAM run faster?
-- pa
#performance2017-12-24 01:38 PM
The ART can provide data to prefetch port within the current cycle, so faster than SRAM on a cache hit. So nets out about even.
2017-12-25 10:00 AM
The manuals mention 'ART accelerator allowing 0-wait state execution'.
That's a marketing lie of course - 0 waitstate only if all jumps are matched by entries in the jumpcache and the linear part of code contains enough single-(half)word instructions. This is rougly fulfilled with simple code with lots of local loops -- which is what's most benchmarks are all about.
There's also interaction with constants memory and data memory and other bus masters activities etc.etc.
Would code copied to RAM run faster?
Execution through S bus is slower than through I bus, so you may want to remap the SRAM to boot position when executing from it.
One of the short lived Technical Updates contained an interesting analysis of execution times vs. various mapping options.
Do all STM32 models run off the internal flash atmaximumspeed?
AFAIK only F4 and F7 through TCM has ART.
JW
2017-12-25 01:53 PM
The flash and cache lines are 128-bit wide, the wait state to load is all in the first word, and the next 7x 16-bit words have zero load time, compared to SRAM which will still cost a cycle a piece. Note that's not the same as zero wait state.
If you have linear execution or hit the cache it will run faster.
2018-01-03 05:11 AM
Thanks for the helpful info. -- pa