cancel
Showing results for 
Search instead for 
Did you mean: 

How to speed up the code execution on STM32H747 / Cortex-M4?

cbcooper
Associate III

I'm programming one of the dual-core STM32s (the STM32H747IGT6) and running into a situation where the CM4 is much slower than I need it to be, and I'm trying to figure out why.

I'm using this short piece of code as my test bed:

        __COMPILER_BARRIER();
        start_timer = TIM5->CNT;
        if ( (loopState == WaitingToStoreValues) && (i < last_i) )
        {
            num_data = 0;
            loopState = StoringValues;
        }
        end_timer = TIM5->CNT;
        __COMPILER_BARRIER();

start_timer, loopState, i, last_i, and num_data are all stack variables, and TIM5 is set to run at full speed (240 MHz) so it ticks once every 4.1666 nsec.  

The conditions in the 'if' statement are rarely met, and when I run this code a gazillion times the delta is 56 ticks 99% of the time so I'm assuming that's the "condition not met" situation.  The rest of the time the delta is anywhere from 18 to 84 ticks.

I tried bracketing with __disable_irq/__enable_irq and it made no difference, so it's not IRQs causing the slowdown.

Is what I'm seeing consistent with the instruction read-ahead feature of the CM4? 

Is there any chance that there is resource contention with the CM7?  The CM4 is using SRAM3 for its RAM and the CM7 isn't using SRAM3 at all, but I'm not sure if there's some kind of bus contention going on.

 

2 REPLIES 2
mƎALLEm
ST Employee

Hello,

1- How did you conclude CM4 is executing much slower than what was expected? comparaison with what?

2- From where CM4 is executing the code?

3- Did you enable the ART related to that memory region (1MB max could be accelerated):

mALLEm_0-1752682035121.png

From stm32h7xx_hal.c:

uint32_t common_system_clock;

#if defined(DUAL_CORE) && defined(CORE_CM4)
   /* Configure Cortex-M4 Instruction cache through ART accelerator */
   __HAL_RCC_ART_CLK_ENABLE();                   /* Enable the Cortex-M4 ART Clock */
   __HAL_ART_CONFIG_BASE_ADDRESS(0x08100000UL);  /* Configure the Cortex-M4 ART Base address to the Flash Bank 2 : */
   __HAL_ART_ENABLE();                           /* Enable the Cortex-M4 ART */
#endif /* DUAL_CORE &&  CORE_CM4 */

-> Here the accelaration is done in the range of 0x08100000 to 0x081FFFFF. No accelaration outside this region.

 

To give better visibility on the answered topics, please click on "Accept as Solution" on the reply which solved your issue or answered your question.
AScha.3
Super User

Hi,

>is much slower than I need it to be

What optimizer setting you use ? (you didnt write.)

So try -Ofast or -O2 (i use this always, giving best balance code-size/speed .).

If you feel a post has answered your question, please click "Accept as Solution".