Task which has control loop runs slower in STM32F407IEH6 compared to STM32F746IGK6
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2017-07-11 3:29 AM
Hi,
I am using STM32F407IEH6 & STM32F746IGK6 boards in my application. I have 10 tasks with scheduler running in the application.
-- APPLICATION has same code on both
STM32F407IEH6 & STM32F746IGK6 .
-- Two boards are running at same clock speed
* All the tasks running with same rate on both the boards
* except the task which has control loops taking more time ( in microseconds, say
STM32F746IGK6 takes 15microsecond then STM32F407IEH6 is taking 18 or 20 microsec)
Any one please share your knowledge why STM32F407IEH6 is taking more time to run the task which has control loops
Thanks
Santosh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2017-07-11 3:45 AM
Flash interface settings ?
Caches on the M7 enabled ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2017-07-11 3:59 AM
Hi,
1. My application is running from flash on both
2. Caches are enabled on M7
Thanks for your reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2017-07-11 6:09 AM
To be more direct - the suggestion would be to test the M7 without caches.
Not sure if results are identical then. With caches, there would be no need for an ART like prefetch buffer like on the M4.
You might read the M7 'fine print'.
With proper application design, reduced runtime should not pose a problem.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2017-07-11 6:41 AM
The ART in this case is a trade off between the inherent slowness of the Flash access, vs the very wide width of it. Where you have sequential access the cost of subsequent words is 0-cycles, compared to 1 cycle directly from SRAM. On the F7 they threw away the caching part because it now has caching at the architecture level, as apposed to bolting it on the side.
The M7 should be able to provide cached and non-cached views of memory, although one should be conscious of coherency.
The M7 is also multi-issue, bit like the Original Pentium was effectively 1.5x 80486, ie add just enough additional logic to frequently allow the execution of two instructions together provided they didn't use the same logic, and didn't rely on the results from the other. A compiler that is M7 aware is likely to alter instruction ordering and register usage so the code runs quicker. As the pipeline depth increases the cost of branches (non-linear code flow) gets more expensive. The M7 has branch prediction.
Up vote any posts that you find helpful, it shows what's working..
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2017-07-11 10:34 PM
Hi,
I have tried with disabling Caches on M7 still it runs faster than M4. Any other feature making it to run faster..
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2017-07-11 10:54 PM
Thanks for the reply, Execution time of task between M4 and M7 depends on instruction fetch, pipe-lining and execution speed even I have same code on all the cores.
:)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2017-07-12 12:28 AM
Clive said it above: the M7 is superscalar. IIRC it also may have a speculative prefetch.
JW
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2017-07-12 10:37 AM
Same code on a 'C' level or assembler one?
Either way the architecture of the M7 is designed to exceed parity. I'm not sure why this is critically important, the discussion seems circular. If you use the M7 with the full FPU-D it's going to be significantly better performing than the M4 with that type of math recompiled to make use of the FPU.
What's the problem with the M7 running faster than the M4? If time is a critical factor use timers to enforce the timeline/pace required.
Up vote any posts that you find helpful, it shows what's working..
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2017-07-13 1:00 AM
My application is in C code.
I have 4 controllers with 2 of them M4 and remaining 2 are M7. After measuring time on these boards. I have noticed tasks with loops taking more time in M4. Anyway I may upgrade all to M7 in the future.
Thanks for the help.
