cancel
Showing results for 
Search instead for 
Did you mean: 

STM32MP13 DK Bare Metal project performance issue

Clement7
Associate II

Hi,

 

I'm discovering the STM32MP13 Bare metal project. I'm following example given by the STM32CubeMP13 Package.

Everything works fine except for the fact that I'm disappointed by the performance. Indeed, if my theory is correct, the following code should change the pin status every second.

 

 

 

void testCPUFreq(){ GPIO_InitTypeDef GPIO_InitStruct = {0}; uint32_t i; __HAL_RCC_GPIOH_CLK_ENABLE(); GPIO_InitStruct.Pin = GPIO_PIN_6 ; GPIO_InitStruct.Mode = GPIO_MODE_OUTPUT_PP; GPIO_InitStruct.Pull = GPIO_NOPULL; GPIO_InitStruct.Speed = GPIO_SPEED_FREQ_VERY_HIGH; HAL_GPIO_Init(GPIOH, &GPIO_InitStruct); while(1){ for(i = 0; i < CPU_FREQUENCY; i++){ } HAL_GPIO_TogglePin(GPIOH, GPIO_PIN_6); } }

 

 

 

I made my measurements using a logic analyzer connected to the pin in question.

  • When I run the test on an STM32F7 target running at 200MHz, the pin status changes every second (which is the expected result).
  • When I run this code on the STM32MP13 DK bare metal running at 650MHz using template given by ST, the pin status changes every 8 seconds.

For the STM32MP13, I use the following configuration :

I'm not using DDR and I'm using the MMU_USE and CACHE_USE preprocessor directive.

In addition, I'm using the following defines:

 

 

 

#define PREFETCH_ENABLE 1U #define INSTRUCTION_CACHE_ENABLE 1U #define DATA_CACHE_ENABLE 1U

 

 

 

Have I forgotten a configuration for optimum performance?

Is anyone experiencing performance problems with STM32MP13 Bare metal?

 

Thanks for you reply.

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
PatrickF
ST Employee

Hi @Clement7 

likely that the code in SYSRAM is not part of cacheable area.

note that overall performance of such code will be much better on Cortex-M7 inside STM32F7 than on Cortex-A7 inside STM32MP13 (but not with a ratio of x8 !).

Performance also depend on bus clocks frequencies (e.g. AXI and AHB clocks used for SYSRAM and for GPIO), compiler (gcc is slightly lower perf than IAR or Keil/ARM) and level of optimization (uses -o2 or -o3).

 

For reference, Coremark/MHz of Cortex-M7 is above 5 (IAR I guess) while it is around 3.2 (GCC) for Cortex-A7 (Cortex-A is more tailored for complex multi-thread usage than for pure real-time)

Regards.

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

View solution in original post

2 REPLIES 2
PatrickF
ST Employee

Hi @Clement7 

likely that the code in SYSRAM is not part of cacheable area.

note that overall performance of such code will be much better on Cortex-M7 inside STM32F7 than on Cortex-A7 inside STM32MP13 (but not with a ratio of x8 !).

Performance also depend on bus clocks frequencies (e.g. AXI and AHB clocks used for SYSRAM and for GPIO), compiler (gcc is slightly lower perf than IAR or Keil/ARM) and level of optimization (uses -o2 or -o3).

 

For reference, Coremark/MHz of Cortex-M7 is above 5 (IAR I guess) while it is around 3.2 (GCC) for Cortex-A7 (Cortex-A is more tailored for complex multi-thread usage than for pure real-time)

Regards.

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.
Clement7
Associate II

Hello @PatrickF 

Thank you for your fast reply and your clarifications. 

I don't know why I missed the optimizations. Enabling level 1 with GCC was enough to make up for the x8. Now the loop changes the status pin every 1 second as expected.

Regards.