cancel
Showing results for 
Search instead for 
Did you mean: 

STM32H7 Dual-Core: GPIO access on CM7 6 times slower than on CM4. Why?

MLade.1
Associate II

I tested the GPIO performance on both cores CM7 and CM4. To do this I used the following loop:

  /* Infinite loop */
  /* USER CODE BEGIN WHILE */
  while (1)
  {
    /* USER CODE END WHILE */
 
    /* USER CODE BEGIN 3 */
		LL_GPIO_SetOutputPin(DBG_D1_GPIO_Port, DBG_D1_Pin);		// PC8
		LL_GPIO_SetOutputPin(DBG_D2_GPIO_Port, DBG_D2_Pin);		// PC9
		LL_GPIO_ResetOutputPin(DBG_D1_GPIO_Port, DBG_D1_Pin);
		LL_GPIO_ResetOutputPin(DBG_D2_GPIO_Port, DBG_D2_Pin);
		LL_GPIO_SetOutputPin(DBG_D1_GPIO_Port, DBG_D1_Pin);
		LL_GPIO_SetOutputPin(DBG_D2_GPIO_Port, DBG_D2_Pin);
		LL_GPIO_ResetOutputPin(DBG_D1_GPIO_Port, DBG_D1_Pin);
		LL_GPIO_ResetOutputPin(DBG_D2_GPIO_Port, DBG_D2_Pin);
		HAL_GPIO_TogglePin(LD1_GPIO_Port, LD1_Pin);
		HAL_Delay(1);
  }
  /* USER CODE END 3 */

On an oscilloscope I checked the outputs PC8 and PC9:

CM7:

0693W00000LxEXBQA3.jpg 

CM4:

0693W00000LxEYOQA3.jpg 

The CM7 needs for a single output operation about 25ns and the CM4 4,2ns. Why is the CM7 slower than CM4?

Used tools and configurations:

Test board: NUCLEO-H755ZI-Q

Initial config: STM32CubeMX v6.4.0

Toolchain: MDK-ARM v5.36.0.0

C/C++ Optimization: Level 3 (-O3)

CM7 config:

- Clock frequency: 480MHz (VOS0)

- CPU ICache and DCache enabled

- assigned memory: DTCM RAM (0x2000 0000 - 0x2001 FFFF)

- MPU not used

CM4 config:

- Clock frequency: 240MHz (VOS0)

- assigned memory: SRAM1, SRAM2, SRAM3 (0x1000 0000 - 0x1004 7FFF)

- MPU not used

Additionally I checked the created .axf files (see bellow). Booth cores used the same assembler instructions for the GPIO access.

fromelf H755ZI_CM7_TIM1_Perf_CM7.axf --disassemble --interleave=source --text -c --output=CM7_out.lst

--- CM7_out.lst ---
;;; ../CM7/Core/Src/main.c (151)
        0x080036ee:    0075        u.      LSLS     r5,r6,#1
;;; ../Drivers/STM32H7xx_HAL_Driver/Inc/stm32h7xx_ll_gpio.h (915)
        0x080036f0:    042f        /.      LSLS     r7,r5,#16
;;; ../Drivers/STM32H7xx_HAL_Driver/Inc/stm32h7xx_ll_gpio.h (886)
        0x080036f2:    61a6        .a      STR      r6,[r4,#0x18]
        0x080036f4:    61a5        .a      STR      r5,[r4,#0x18]
;;; ../Drivers/STM32H7xx_HAL_Driver/Inc/stm32h7xx_ll_gpio.h (915)
        0x080036f6:    f8c48018    ....    STR      r8,[r4,#0x18]
        0x080036fa:    61a7        .a      STR      r7,[r4,#0x18]
;;; ../Drivers/STM32H7xx_HAL_Driver/Inc/stm32h7xx_ll_gpio.h (886)
        0x080036fc:    61a6        .a      STR      r6,[r4,#0x18]
        0x080036fe:    61a5        .a      STR      r5,[r4,#0x18]
;;; ../Drivers/STM32H7xx_HAL_Driver/Inc/stm32h7xx_ll_gpio.h (915)
        0x08003700:    f8c48018    ....    STR      r8,[r4,#0x18]
        0x08003704:    61a7        .a      STR      r7,[r4,#0x18]
;;; ../CM7/Core/Src/main.c (158)
        0x08003706:    2101        .!      MOVS     r1,#1
        0x08003708:    4648        HF      MOV      r0,r9
        0x0800370a:    f7fdf83d    ..=.    BL       HAL_GPIO_TogglePin ; 0x8000788
;;; ../CM7/Core/Src/main.c (159)
        0x0800370e:    2001        .       MOVS     r0,#1
        0x08003710:    f7fcff12    ....    BL       HAL_Delay ; 0x8000538
        0x08003714:    e7ed        ..      B        0x80036f2 ; main + 282

fromelf H755ZI_CM4_TIM1_Perf_CM4.axf --disassemble --interleave=source --text -c --output=CM4_out.lst

--- CM4_out.lst ---
;;; ../CM4/Core/Src/main.c (140)
        0x08102d2c:    f8df9044    ..D.    LDR      r9,[pc,#68] ; [0x8102d74] = 0x58020400
;;; ../Drivers/STM32H7xx_HAL_Driver/Inc/stm32h7xx_ll_gpio.h (915)
        0x08102d30:    f04f7880    O..x    MOV      r8,#0x1000000
        0x08102d34:    046f        o.      LSLS     r7,r5,#17
;;; ../Drivers/STM32H7xx_HAL_Driver/Inc/stm32h7xx_ll_gpio.h (886)
        0x08102d36:    61a5        .a      STR      r5,[r4,#0x18]
        0x08102d38:    61a6        .a      STR      r6,[r4,#0x18]
;;; ../Drivers/STM32H7xx_HAL_Driver/Inc/stm32h7xx_ll_gpio.h (915)
        0x08102d3a:    f8c48018    ....    STR      r8,[r4,#0x18]
        0x08102d3e:    61a7        .a      STR      r7,[r4,#0x18]
;;; ../Drivers/STM32H7xx_HAL_Driver/Inc/stm32h7xx_ll_gpio.h (886)
        0x08102d40:    61a5        .a      STR      r5,[r4,#0x18]
        0x08102d42:    61a6        .a      STR      r6,[r4,#0x18]
;;; ../Drivers/STM32H7xx_HAL_Driver/Inc/stm32h7xx_ll_gpio.h (915)
        0x08102d44:    f8c48018    ....    STR      r8,[r4,#0x18]
        0x08102d48:    61a7        .a      STR      r7,[r4,#0x18]
        0x08102d4a:    2101        .!      MOVS     r1,#1
        0x08102d4c:    4648        HF      MOV      r0,r9
        0x08102d4e:    f7fdfd2f    ../.    BL       HAL_GPIO_TogglePin ; 0x81007b0
;;; ../CM4/Core/Src/main.c (141)
        0x08102d52:    2001        .       MOVS     r0,#1
        0x08102d54:    f7fdfbf0    ....    BL       HAL_Delay ; 0x8100538
        0x08102d58:    e7ed        ..      B        0x8102d36 ; main + 142

Is it possible to improve the GPIO performance on the core CM7?

Kind regards

Michael

3 REPLIES 3
SofLit
ST Employee

Hello,

This is a known behavior in H7 family. Please check out these discussions in the community:

https://community.st.com/s/global-search/stm32h7-gpio-togle-max-frequency

To give better visibility on the answered topics, please click on "Accept as Solution" on the reply which solved your issue or answered your question.
PS: Be polite in your reply. Otherwise, it will be reported as inappropriate and you will be permanently blacklisted from my help/support.

The GPIO's are parked way off on AHB4/APB4, the M4 is closer, look at the organizational diagram.

>>is it possible to improve the GPIO performance on the core CM7?

Perhaps use the function to write the state of both pins together? Pretty much double the toggle rate right there.

Most people don't use a 400 MHz MCU to do this, couldn't you drive with a TIM, or a CPLD if you want wire-speed operation?

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..
MLade.1
Associate II

Thank you for your feedback.

I performed some further measurements using the DWT and an oscilloscope. It seems I have to accept the longer GPIO times of the CM7 core. 

Summary of my measurements:

Core CM7, 480MHz
----------------
LL_GPIO_SetOutputPin(DBG_D1_GPIO_Port, DBG_D1_Pin);         // 12 cycles, 25ns
LL_GPIO_ResetOutputPin(DBG_D1_GPIO_Port, DBG_D1_Pin);       // 12 cycles, 25ns
WRITE_REG(TIM1->CCMR1, 0x00000050);         // TIM1 CH1 FORCED_ACTIVE;   8 cycles, 16,7ns
WRITE_REG(TIM1->CCMR1, 0x00000040);         // TIM1 CH1 FORCED_INACTIVE; 8 cycles, 16,7ns
 
 
Core CM4, 240MHz
----------------
LL_GPIO_SetOutputPin(DBG_D1_GPIO_Port, DBG_D1_Pin);         // 1 cycle, 4,2ns
LL_GPIO_ResetOutputPin(DBG_D1_GPIO_Port, DBG_D1_Pin);       // 1 cycle, 4,2ns
WRITE_REG(TIM1->CCMR1, 0x00000050);         // TIM1 CH1 FORCED_ACTIVE;   4 cycles, 16,7ns
WRITE_REG(TIM1->CCMR1, 0x00000040);         // TIM1 CH1 FORCED_INACTIVE; 4 cycles, 16,7ns

Kind regards

Michael