ART accelarator of STM32H7

dqi07234 · ‎2016-10-25

Posted on October 26, 2016 at 08:45

Hello experts,

when compared the block diagrams of STM32F7 and STM32H7, STM32H7 seems to have no ART accelarator. Cannot software access to the internal Flash without any wait cycles?

Best regards,

Yasuhiko Koumoto.

#stm32h7-art

waclawek.jan · ‎2016-10-26

Posted on October 26, 2016 at 09:05

It appears that the sole source of FLASH latency mitigation is the L1 cache now.

JW

dqi07234 · ‎2016-10-26

Posted on October 27, 2016 at 06:31

Thank you for your reply.

Isn't it impossible to make time deterministic operations with L1 cache, is it?

L1 cache miss cannot guarantee the deterministic operations.

Is it correct?

Best regards,

Yasuhiko Koumoto.

waclawek.jan · ‎2016-10-27

Posted on October 27, 2016 at 13:57

''Deterministic'' is an expression which requires a lot of stretching in EVERY 32-bitter. These are SoC rather than microcontrollers, with a great number of sources of jitter/delay/wait/resource-conflict, name it whatever you want. As long as it runs out of a common clock - which is not a thing guaranteed either in some of the cases - all of those is still very deterministic, except that the *determine* part of it gets extremely complicated far beyond the point of being economic. And, of course, the full information needed to *determine* is usually not publicly available. The manufacturer may provide you cycle-precise simulations, and what I have learned they are willing to do so if you provide enough economic incentive - we are speaking millions worth of chips now. And that all is only for severely limited scenarios, from the usual application point of view.

Particularly, the cache in Cortex-M7 is not less deterministic than the ART (which is nothing more than a jumpcache). AFAIK, neither has lockable portions (as the jumpcache in venerable 100MHz Silabs/Cygnal '51 has for ages). You of course can avoid the cache and increase determinism somewhat in exchange for execution speed. Or you can try to execute from noncached SRAM, although I am not sure what sort of penalty that may involve.

The simplest way to cope with this all is to give up any hopes for cycle-precise-determinism-of-processor-controlled-events for any 32-bitter, up front. The best you can hope for is in the order of dozens/grosses of cycles. Rely on the hardware attached to the core, it's extensive for a reason (and it may be less-than-cycle-precise-deterministic as well, if shared resources are involved, e.g. when using DMA).

And, of course, we (i.e. engineer types, as opposed to the managers types) all know the value of benchmarks. The 5.01Coremark/MHz tag is undoubtedly due to Coremark being limited enough in requirements to fit all the data and code into the caches (as it was with ART fine-tuned for the same purpose in 'F4, to claim ''0-waitstate execution''). This of course far from being true for any nontrivial real-world application.

JW

Renaud BOUZEREAU · ‎2017-06-01

Posted on June 01, 2017 at 12:19

The I-TCM and D-TCM memories are meant to host deterministic parts of code and data.