STM32MP13 DK Bare Metal project performance issue
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2024-07-11 05:47 AM - last edited on ‎2024-07-11 06:02 AM by STTwo-32
Hi,
I'm discovering the STM32MP13 Bare metal project. I'm following example given by the STM32CubeMP13 Package.
Everything works fine except for the fact that I'm disappointed by the performance. Indeed, if my theory is correct, the following code should change the pin status every second.
void testCPUFreq(){
GPIO_InitTypeDef GPIO_InitStruct = {0};
uint32_t i;
__HAL_RCC_GPIOH_CLK_ENABLE();
GPIO_InitStruct.Pin = GPIO_PIN_6 ;
GPIO_InitStruct.Mode = GPIO_MODE_OUTPUT_PP;
GPIO_InitStruct.Pull = GPIO_NOPULL;
GPIO_InitStruct.Speed = GPIO_SPEED_FREQ_VERY_HIGH;
HAL_GPIO_Init(GPIOH, &GPIO_InitStruct);
while(1){
for(i = 0; i < CPU_FREQUENCY; i++){
}
HAL_GPIO_TogglePin(GPIOH, GPIO_PIN_6);
}
}
I made my measurements using a logic analyzer connected to the pin in question.
- When I run the test on an STM32F7 target running at 200MHz, the pin status changes every second (which is the expected result).
- When I run this code on the STM32MP13 DK bare metal running at 650MHz using template given by ST, the pin status changes every 8 seconds.
For the STM32MP13, I use the following configuration :
I'm not using DDR and I'm using the MMU_USE and CACHE_USE preprocessor directive.
In addition, I'm using the following defines:
#define PREFETCH_ENABLE 1U
#define INSTRUCTION_CACHE_ENABLE 1U
#define DATA_CACHE_ENABLE 1U
Have I forgotten a configuration for optimum performance?
Is anyone experiencing performance problems with STM32MP13 Bare metal?
Thanks for you reply.
Solved! Go to Solution.
- Labels:
-
STM32MP13 Lines
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2024-07-11 06:15 AM - edited ‎2024-07-11 07:20 AM
Hi @Clement7
likely that the code in SYSRAM is not part of cacheable area.
note that overall performance of such code will be much better on Cortex-M7 inside STM32F7 than on Cortex-A7 inside STM32MP13 (but not with a ratio of x8 !).
Performance also depend on bus clocks frequencies (e.g. AXI and AHB clocks used for SYSRAM and for GPIO), compiler (gcc is slightly lower perf than IAR or Keil/ARM) and level of optimization (uses -o2 or -o3).
For reference, Coremark/MHz of Cortex-M7 is above 5 (IAR I guess) while it is around 3.2 (GCC) for Cortex-A7 (Cortex-A is more tailored for complex multi-thread usage than for pure real-time)
Regards.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2024-07-11 06:15 AM - edited ‎2024-07-11 07:20 AM
Hi @Clement7
likely that the code in SYSRAM is not part of cacheable area.
note that overall performance of such code will be much better on Cortex-M7 inside STM32F7 than on Cortex-A7 inside STM32MP13 (but not with a ratio of x8 !).
Performance also depend on bus clocks frequencies (e.g. AXI and AHB clocks used for SYSRAM and for GPIO), compiler (gcc is slightly lower perf than IAR or Keil/ARM) and level of optimization (uses -o2 or -o3).
For reference, Coremark/MHz of Cortex-M7 is above 5 (IAR I guess) while it is around 3.2 (GCC) for Cortex-A7 (Cortex-A is more tailored for complex multi-thread usage than for pure real-time)
Regards.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2024-07-15 12:19 AM
Hello @PatrickF
Thank you for your fast reply and your clarifications.
I don't know why I missed the optimizations. Enabling level 1 with GCC was enough to make up for the x8. Now the loop changes the status pin every 1 second as expected.
Regards.