STM32H7 Dual-Core: GPIO access on CM7 6 times slower than on CM4. Why?

I tested the GPIO performance on both cores CM7 and CM4. To do this I used the following loop:

  /* Infinite loop */
  while (1)
    /* USER CODE BEGIN 3 */
		LL_GPIO_SetOutputPin(DBG_D1_GPIO_Port, DBG_D1_Pin);		// PC8
		LL_GPIO_SetOutputPin(DBG_D2_GPIO_Port, DBG_D2_Pin);		// PC9
		LL_GPIO_ResetOutputPin(DBG_D1_GPIO_Port, DBG_D1_Pin);
		LL_GPIO_ResetOutputPin(DBG_D2_GPIO_Port, DBG_D2_Pin);
		LL_GPIO_SetOutputPin(DBG_D1_GPIO_Port, DBG_D1_Pin);
		LL_GPIO_SetOutputPin(DBG_D2_GPIO_Port, DBG_D2_Pin);
		LL_GPIO_ResetOutputPin(DBG_D1_GPIO_Port, DBG_D1_Pin);
		LL_GPIO_ResetOutputPin(DBG_D2_GPIO_Port, DBG_D2_Pin);
		HAL_GPIO_TogglePin(LD1_GPIO_Port, LD1_Pin);
  /* USER CODE END 3 */

On an oscilloscope I checked the outputs PC8 and PC9:





The CM7 needs for a single output operation about 25ns and the CM4 4,2ns. Why is the CM7 slower than CM4?

Used tools and configurations:

Test board: NUCLEO-H755ZI-Q

Initial config: STM32CubeMX v6.4.0

Toolchain: MDK-ARM v5.36.0.0

C/C++ Optimization: Level 3 (-O3)

CM7 config:

- Clock frequency: 480MHz (VOS0)

- CPU ICache and DCache enabled

- assigned memory: DTCM RAM (0x2000 0000 - 0x2001 FFFF)

- MPU not used

CM4 config:

- Clock frequency: 240MHz (VOS0)

- assigned memory: SRAM1, SRAM2, SRAM3 (0x1000 0000 - 0x1004 7FFF)

- MPU not used

Additionally I checked the created .axf files (see bellow). Booth cores used the same assembler instructions for the GPIO access.

fromelf H755ZI_CM7_TIM1_Perf_CM7.axf --disassemble --interleave=source --text -c --output=CM7_out.lst

--- CM7_out.lst ---
;;; ../CM7/Core/Src/main.c (151)
        0x080036ee:    0075        u.      LSLS     r5,r6,#1
;;; ../Drivers/STM32H7xx_HAL_Driver/Inc/stm32h7xx_ll_gpio.h (915)
        0x080036f0:    042f        /.      LSLS     r7,r5,#16
;;; ../Drivers/STM32H7xx_HAL_Driver/Inc/stm32h7xx_ll_gpio.h (886)
        0x080036f2:    61a6        .a      STR      r6,[r4,#0x18]
        0x080036f4:    61a5        .a      STR      r5,[r4,#0x18]
;;; ../Drivers/STM32H7xx_HAL_Driver/Inc/stm32h7xx_ll_gpio.h (915)
        0x080036f6:    f8c48018    ....    STR      r8,[r4,#0x18]
        0x080036fa:    61a7        .a      STR      r7,[r4,#0x18]
;;; ../Drivers/STM32H7xx_HAL_Driver/Inc/stm32h7xx_ll_gpio.h (886)
        0x080036fc:    61a6        .a      STR      r6,[r4,#0x18]
        0x080036fe:    61a5        .a      STR      r5,[r4,#0x18]
;;; ../Drivers/STM32H7xx_HAL_Driver/Inc/stm32h7xx_ll_gpio.h (915)
        0x08003700:    f8c48018    ....    STR      r8,[r4,#0x18]
        0x08003704:    61a7        .a      STR      r7,[r4,#0x18]
;;; ../CM7/Core/Src/main.c (158)
        0x08003706:    2101        .!      MOVS     r1,#1
        0x08003708:    4648        HF      MOV      r0,r9
        0x0800370a:    f7fdf83d    ..=.    BL       HAL_GPIO_TogglePin ; 0x8000788
;;; ../CM7/Core/Src/main.c (159)
        0x0800370e:    2001        .       MOVS     r0,#1
        0x08003710:    f7fcff12    ....    BL       HAL_Delay ; 0x8000538
        0x08003714:    e7ed        ..      B        0x80036f2 ; main + 282

fromelf H755ZI_CM4_TIM1_Perf_CM4.axf --disassemble --interleave=source --text -c --output=CM4_out.lst

--- CM4_out.lst ---
;;; ../CM4/Core/Src/main.c (140)
        0x08102d2c:    f8df9044    ..D.    LDR      r9,[pc,#68] ; [0x8102d74] = 0x58020400
;;; ../Drivers/STM32H7xx_HAL_Driver/Inc/stm32h7xx_ll_gpio.h (915)
        0x08102d30:    f04f7880    O..x    MOV      r8,#0x1000000
        0x08102d34:    046f        o.      LSLS     r7,r5,#17
;;; ../Drivers/STM32H7xx_HAL_Driver/Inc/stm32h7xx_ll_gpio.h (886)
        0x08102d36:    61a5        .a      STR      r5,[r4,#0x18]
        0x08102d38:    61a6        .a      STR      r6,[r4,#0x18]
;;; ../Drivers/STM32H7xx_HAL_Driver/Inc/stm32h7xx_ll_gpio.h (915)
        0x08102d3a:    f8c48018    ....    STR      r8,[r4,#0x18]
        0x08102d3e:    61a7        .a      STR      r7,[r4,#0x18]
;;; ../Drivers/STM32H7xx_HAL_Driver/Inc/stm32h7xx_ll_gpio.h (886)
        0x08102d40:    61a5        .a      STR      r5,[r4,#0x18]
        0x08102d42:    61a6        .a      STR      r6,[r4,#0x18]
;;; ../Drivers/STM32H7xx_HAL_Driver/Inc/stm32h7xx_ll_gpio.h (915)
        0x08102d44:    f8c48018    ....    STR      r8,[r4,#0x18]
        0x08102d48:    61a7        .a      STR      r7,[r4,#0x18]
        0x08102d4a:    2101        .!      MOVS     r1,#1
        0x08102d4c:    4648        HF      MOV      r0,r9
        0x08102d4e:    f7fdfd2f    ../.    BL       HAL_GPIO_TogglePin ; 0x81007b0
;;; ../CM4/Core/Src/main.c (141)
        0x08102d52:    2001        .       MOVS     r0,#1
        0x08102d54:    f7fdfbf0    ....    BL       HAL_Delay ; 0x8100538
        0x08102d58:    e7ed        ..      B        0x8102d36 ; main + 142

Is it possible to improve the GPIO performance on the core CM7?

Kind regards


ST Employee


This is a known behavior in H7 family. Please check out these discussions in the community:

The GPIO's are parked way off on AHB4/APB4, the M4 is closer, look at the organizational diagram.

>>is it possible to improve the GPIO performance on the core CM7?

Perhaps use the function to write the state of both pins together? Pretty much double the toggle rate right there.

Most people don't use a 400 MHz MCU to do this, couldn't you drive with a TIM, or a CPLD if you want wire-speed operation?

Associate II

Thank you for your feedback.

I performed some further measurements using the DWT and an oscilloscope. It seems I have to accept the longer GPIO times of the CM7 core. 

Summary of my measurements:

Core CM7, 480MHz
LL_GPIO_SetOutputPin(DBG_D1_GPIO_Port, DBG_D1_Pin);         // 12 cycles, 25ns
LL_GPIO_ResetOutputPin(DBG_D1_GPIO_Port, DBG_D1_Pin);       // 12 cycles, 25ns
WRITE_REG(TIM1->CCMR1, 0x00000050);         // TIM1 CH1 FORCED_ACTIVE;   8 cycles, 16,7ns
WRITE_REG(TIM1->CCMR1, 0x00000040);         // TIM1 CH1 FORCED_INACTIVE; 8 cycles, 16,7ns
Core CM4, 240MHz
LL_GPIO_SetOutputPin(DBG_D1_GPIO_Port, DBG_D1_Pin);         // 1 cycle, 4,2ns
LL_GPIO_ResetOutputPin(DBG_D1_GPIO_Port, DBG_D1_Pin);       // 1 cycle, 4,2ns
WRITE_REG(TIM1->CCMR1, 0x00000050);         // TIM1 CH1 FORCED_ACTIVE;   4 cycles, 16,7ns
WRITE_REG(TIM1->CCMR1, 0x00000040);         // TIM1 CH1 FORCED_INACTIVE; 4 cycles, 16,7ns

Kind regards
