2024-07-11 05:47 AM - last edited on 2024-07-11 06:02 AM by STTwo-32
Hi,
I'm discovering the STM32MP13 Bare metal project. I'm following example given by the STM32CubeMP13 Package.
Everything works fine except for the fact that I'm disappointed by the performance. Indeed, if my theory is correct, the following code should change the pin status every second.
void testCPUFreq(){
GPIO_InitTypeDef GPIO_InitStruct = {0};
uint32_t i;
__HAL_RCC_GPIOH_CLK_ENABLE();
GPIO_InitStruct.Pin = GPIO_PIN_6 ;
GPIO_InitStruct.Mode = GPIO_MODE_OUTPUT_PP;
GPIO_InitStruct.Pull = GPIO_NOPULL;
GPIO_InitStruct.Speed = GPIO_SPEED_FREQ_VERY_HIGH;
HAL_GPIO_Init(GPIOH, &GPIO_InitStruct);
while(1){
for(i = 0; i < CPU_FREQUENCY; i++){
}
HAL_GPIO_TogglePin(GPIOH, GPIO_PIN_6);
}
}
I made my measurements using a logic analyzer connected to the pin in question.
For the STM32MP13, I use the following configuration :
I'm not using DDR and I'm using the MMU_USE and CACHE_USE preprocessor directive.
In addition, I'm using the following defines:
#define PREFETCH_ENABLE 1U
#define INSTRUCTION_CACHE_ENABLE 1U
#define DATA_CACHE_ENABLE 1U
Have I forgotten a configuration for optimum performance?
Is anyone experiencing performance problems with STM32MP13 Bare metal?
Thanks for you reply.
Solved! Go to Solution.
2024-07-11 06:15 AM - edited 2024-07-11 07:20 AM
Hi @Clement7
likely that the code in SYSRAM is not part of cacheable area.
note that overall performance of such code will be much better on Cortex-M7 inside STM32F7 than on Cortex-A7 inside STM32MP13 (but not with a ratio of x8 !).
Performance also depend on bus clocks frequencies (e.g. AXI and AHB clocks used for SYSRAM and for GPIO), compiler (gcc is slightly lower perf than IAR or Keil/ARM) and level of optimization (uses -o2 or -o3).
For reference, Coremark/MHz of Cortex-M7 is above 5 (IAR I guess) while it is around 3.2 (GCC) for Cortex-A7 (Cortex-A is more tailored for complex multi-thread usage than for pure real-time)
Regards.
2024-07-11 06:15 AM - edited 2024-07-11 07:20 AM
Hi @Clement7
likely that the code in SYSRAM is not part of cacheable area.
note that overall performance of such code will be much better on Cortex-M7 inside STM32F7 than on Cortex-A7 inside STM32MP13 (but not with a ratio of x8 !).
Performance also depend on bus clocks frequencies (e.g. AXI and AHB clocks used for SYSRAM and for GPIO), compiler (gcc is slightly lower perf than IAR or Keil/ARM) and level of optimization (uses -o2 or -o3).
For reference, Coremark/MHz of Cortex-M7 is above 5 (IAR I guess) while it is around 3.2 (GCC) for Cortex-A7 (Cortex-A is more tailored for complex multi-thread usage than for pure real-time)
Regards.
2024-07-15 12:19 AM
Hello @PatrickF
Thank you for your fast reply and your clarifications.
I don't know why I missed the optimizations. Enabling level 1 with GCC was enough to make up for the x8. Now the loop changes the status pin every 1 second as expected.
Regards.