cancel
Showing results for 
Search instead for 
Did you mean: 

STM32F767 Execution time is more compared with STM32F429

RBG
Associate II

Tried creating OS task 100 times in Example codes with FreeRTOS taken from STM32CubeMX for both F429 and F767 and found the observations as below.

F429 - 6 Ticks

F767 - 16 Ticks

Difference - 10 Ticks

What is the reason for the delay and Is there any other way to speed up

14 REPLIES 14
Uwe Bonnes
Principal III

Different number of wait states, Code in RAM? Show relevant parts!

RBG
Associate II

Below added code snippet which is used for testing both F429 and F767 boards. Found the Tick difference between part highlighted.

0690X00000Bw3kbQAB.jpg

RBG
Associate II

Even for simple malloc observed the tick difference between F429 and F767.

For memory allocation 10000 times.

F429 - 68 Ticks

F767 - 78 Ticks

Tick difference - 10Ticks

Below is the code part.

void StartDefaultTask(void const * argument)

 int *ptr;

 /* USER CODE BEGIN 5 */

 /* Infinite loop */

 for(;;)

 {

   printf( "Tick_test_1:%d\n", xTaskGetTickCount() );

   for(long i=0;i<10000;i++)

   {

     ptr = (int*) malloc(5*sizeof(int));

   }

   printf( "Tick_test_2:%d\n", xTaskGetTickCount() );

   osDelay(1);

 }

}

Piranha
Chief II

Do you understand that the first printf() and (I guess) UART transmission underneath is included in your measurement? And xTaskCreate() and malloc() both use dynamic memory and are not deterministic in terms of both - processing time and success of result.

RBG
Associate II

yes, I tried in other approach. Is this a better method to check the performance.

I tried to increment a variable in one tick count and the results are below.

F429 - a=976

F767 - a=691

F767 is not running as many times F429 is running through the code in specific tick.

And the situation is only task running that is this default task and code base is default simple example code taken from STM32cubemx

0690X00000BwNgZQAV.jpg

Piranha
Chief II

Disable all interrupts (__disable_irq()/__enable_irq()) and use DWT->CYCCNT for precise measurement.

How are clocks, PLL, buses, flash and cache configured?

RBG
Associate II

I tried attaching the complete code but it is not allowed here. I am attaching the main function snapshot and system clock config functions snapshot.

Code is taken from STM32CubeMX V 4.24

Firmware package versions

F429 - STM32Cube_FW_F4_V1.9.0

F767 - STM32Cube_FW_F7_V1.15.0

Nothing else is changed in that example.

Results for the below code when kept variable(a) in live watch:

F429 - a=998

F767 - a=6610690X00000BwWP8QAN.jpg0690X00000BwWOyQAN.jpg

Piranha
Chief II

Compare the how HAL_Init() configures FLASH_ACR in both cases.

@Piranha​ @Uwe Bonnes​ 

Major difference in Hal_init() is data and instruction cache and prefetch .

Tried the combinations and didn't find much diffference.

F429-with cache and prefetch enabled - a=998

F429 with cache and prefetch disabled - a=997

F767-with cache and prefetch enabled - a=661

F767 with cache and prefetch disabled - a=661

F767 is slow because of there is no data caching ?

Hal_init comparison F767-F4290690X00000BwaylQAB.jpg

F429_Flash_register_status0690X00000Bwb24QAB.jpg

F767_Flash_register_status

0690X00000Bwb2TQAR.jpg