cancel
Showing results for 
Search instead for 
Did you mean: 

STM32F767 Execution time is more compared with STM32F429

RBG
Associate II

Tried creating OS task 100 times in Example codes with FreeRTOS taken from STM32CubeMX for both F429 and F767 and found the observations as below.

F429 - 6 Ticks

F767 - 16 Ticks

Difference - 10 Ticks

What is the reason for the delay and Is there any other way to speed up

14 REPLIES 14
Danish1
Lead III

One reason the 'F7 is can be slower because it has a longer pipeline. On any branch (function-call, if, goto, loop), any partly-executed instructions in the pipeline have to be abandoned and the new instruction sequence has to be loaded. (Inlining a function-call eliminates this.)

Why do this? Because that means the processor can be clocked at a higher frequency - if you choose to do so.

The F7 has an advantage that it can sometimes execute two instructions simultaneously, which the 'F4 cannot. This very much depends on the data dependencies between successive instructions, and it takes a clever compiler run at high optimisation-level to take full advantage of this.

What optimisation-level were you compiling at? F7 is likely to optimise better.

You will find examples where F4 wins over F7 in terms of cycle count. And you'll find examples where F7 wins.

Hope this helps,

Danish

Is it a disadvantage of having longer pipeline? Because F429 performance is better over F767 for a simple execution.

I am running both hardwares in high optimization

0690X00000BwbZDQAZ.jpg

My concern is F767 is taking more time for malloc ,creating OS tasks and printf's and other major tasks than F429.

Or some configuration i can change to increase the speed. Or i can conclude F767 is slower than F429

Yeah that was a different approach using task created using FreeRTOS.

Now tried with the simple code. which is incrementing a initialized global integer(a) 0690X00000BwbeNQAR.jpg

0690X00000BwWP8QAN.jpg

What is HAL_GetTick()?

Did you observe the disasm in both cases?

Did you read back and observe the RCC registers?

From where does the code run in both cases? How are caches involved?

Where are all related variables located? How are caches involved?

Is stack involved? If so, where is it located and how are caches involved?

How does freertos influence the above loop?

I am not interested in answers; this is just a sample of questions you should be perhaps interested in.

JW