cancel
Showing results for 
Search instead for 
Did you mean: 

Task which has control loop runs slower in STM32F407IEH6 compared to STM32F746IGK6

Santhosh KM
Associate II
Posted on July 11, 2017 at 12:29

Hi,

I am using STM32F407IEH6 & STM32F746IGK6 boards in my application. I have 10 tasks with scheduler running in the application. 

-- APPLICATION has same code on both 

STM32F407IEH6 & STM32F746IGK6 .

-- Two boards are running at same clock speed

* All the tasks running with same rate on both the boards

* except the task which has control loops taking more time ( in microseconds, say 

STM32F746IGK6 takes 15microsecond then STM32F407IEH6 is taking 18 or 20 microsec)

Any one please share your knowledge why STM32F407IEH6 is taking more time to run the task which has control loops

Thanks

Santosh

9 REPLIES 9
AvaTar
Lead
Posted on July 11, 2017 at 12:45

Flash interface settings ?

Caches on the M7 enabled ?

Santhosh KM
Associate II
Posted on July 11, 2017 at 12:59

Hi, 

1. My application is running from flash on both

2. Caches are enabled on M7

Thanks for your reply

Posted on July 11, 2017 at 13:09

To be more direct - the suggestion would be to test the M7 without caches.

Not sure if results are identical then. With caches, there would be no need for an ART like prefetch buffer like on the M4.

You might read the M7 'fine print'.

With proper application design, reduced runtime should not pose a problem.

Posted on July 11, 2017 at 13:41

The ART in this case is a trade off between the inherent slowness of the Flash access, vs the very wide width of it. Where you have sequential access the cost of subsequent words is 0-cycles, compared to 1 cycle directly from SRAM. On the F7 they threw away the caching part because it now has caching at the architecture level, as apposed to bolting it on the side.

The M7 should be able to provide cached and non-cached views of memory, although one should be conscious of coherency.

The M7 is also multi-issue, bit like the Original Pentium was effectively 1.5x 80486, ie add just enough additional logic to frequently allow the execution of two instructions together provided they didn't use the same logic, and didn't rely on the results from the other. A compiler that is M7 aware is likely to alter instruction ordering and register usage so the code runs quicker. As the pipeline depth increases the cost of branches (non-linear code flow) gets more expensive. The M7 has branch prediction.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..
Posted on July 12, 2017 at 05:34

Hi,

I have tried with disabling Caches on M7 still it runs faster than M4. Any other feature making it to run faster..

Santhosh KM
Associate II
Posted on July 12, 2017 at 07:54

Thanks for the reply, Execution time of task between M4 and M7 depends on instruction fetch, pipe-lining and execution speed even I have same code on all the cores.

🙂

Posted on July 12, 2017 at 07:28

Clive said it above: the M7 is superscalar. IIRC it also may have a speculative prefetch.

JW

Posted on July 12, 2017 at 17:37

Same code on a 'C' level or assembler one?

Either way the architecture of the M7 is designed to exceed parity. I'm not sure why this is critically important, the discussion seems circular. If you use the M7 with the full FPU-D it's going to be significantly better performing than the M4 with that type of math recompiled to make use of the FPU.

What's the problem with the M7 running faster than the M4? If time is a critical factor use timers to enforce the timeline/pace required.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..
Santhosh KM
Associate II
Posted on July 13, 2017 at 10:00

My application is in C code.

I have 4 controllers with 2 of them M4 and remaining 2 are M7. After measuring time on these boards. I have noticed tasks with loops taking more time in M4. Anyway I may upgrade all to M7 in the future.

Thanks for the help.