cancel
Showing results for 
Search instead for 
Did you mean: 

Why stm32H743zi has longer access time to GPIO interface?

MasterT
Lead

I have two nucleo-144 boards, stm32F767zi and stm32H743zi. Both cortex-M7, pin compatible, and pretty close in AHBx (GPIO) clock, 216 vs 200 MHz.

Testing TFT LCD display library, that required fast 8-bits access to the LCD data bus, I discovered that the same code running on two boards shows almost 2x times difference in timing.

Specifics of interfacing TFT shield (arduino UNO form factor) to nucleo-144 is that random bit patterns generated on 3 GPIO ports simultaneously.

Here is the test code:

#define REGS(x) x
 
#if defined(STM32F767xx)
#define GPIO_INIT()   { RCC->AHB1ENR |= RCC_AHB1ENR_GPIOAEN | RCC_AHB1ENR_GPIOCEN | RCC_AHB1ENR_GPIODEN | RCC_AHB1ENR_GPIOEEN | RCC_AHB1ENR_GPIOFEN; }
#elif defined(STM32H743xx)
#define GPIO_INIT()   { RCC->AHB4ENR |= RCC_AHB4ENR_GPIOAEN | RCC_AHB4ENR_GPIOCEN | RCC_AHB4ENR_GPIODEN | RCC_AHB4ENR_GPIOEEN | RCC_AHB4ENR_GPIOFEN; }
#endif
 
 
// configure macros for the data pins
#define DMASK ((1<<15))                         //#1
#define EMASK ((1<<13)|(1<<11)|(1<<9))          //#3, #5, #6
#define FMASK ((1<<12)|(1<<15)|(1<<14)|(1<<13)) //#0, #2, #4, #7
 
#define write_8(d) { \
        GPIOD->REGS(BSRR) = DMASK << 16; \
        GPIOE->REGS(BSRR) = EMASK << 16; \
        GPIOF->REGS(BSRR) = FMASK << 16; \
        GPIOD->REGS(BSRR) = (  ((d) & (1<<1)) << 14); \
        GPIOE->REGS(BSRR) = (  ((d) & (1<<3)) << 10) \
                            | (((d) & (1<<5)) << 6) \
                            | (((d) & (1<<6)) << 3); \
        GPIOF->REGS(BSRR) = (  ((d) & (1<<0)) << 12) \
                            | (((d) & (1<<2)) << 13) \
                            | (((d) & (1<<4)) << 10) \
                            | (((d) & (1<<7)) << 6); \
    }
 
 
//                                             PD15                PE13,PE11,PE9          PF15,PF14,PF13,PF12
#define setWriteDir() { setReadDir(); \
                        GPIOD->MODER |=  0x40000000; GPIOE->MODER |=  0x04440000; GPIOF->MODER |=  0x55000000; }
#define setReadDir()  { GPIOD->MODER &= ~0xC0000000; GPIOE->MODER &= ~0x0CCC0000; GPIOF->MODER &= ~0xFF000000; }
 
 
 
  volatile  uint32_t temp1, temp2;
 
 
void setup()
{
 
  GPIO_INIT();
  setWriteDir();
/*
  __HAL_RCC_CSI_ENABLE() ;
  __HAL_RCC_SYSCFG_CLK_ENABLE() ;
    HAL_EnableCompensationCell();  
*/
}
 
void loop()
{
  // H743 = 2.777 MHz
  // F767 = 4.908 MHz
  
    temp1 = 0xAAAAAAAA;
    temp2 = 0x55555555;
    
    for( uint32_t i = 0; i < 0x1000000UL; i++) {  
      write_8(temp1);
      write_8(temp2);
      }
}

Compiled in arduino IDE 1.8.9.

The question is Why stm32H743 is slower than F767?

Is it because GPIO bus transferred to AHB4 (F767 -> AHB1) or I'm missing something? Any thoughts?

2 REPLIES 2

In the 'H7, first, there's conversion from AXI to AHB in the D1 matrix, and that may introduce some delay. In 'F7, the AHB buses are served by a dedicated AHBP port of the processor, bypassing AXI (in 'H7, this port is used exclusively to connect D2 peripherals). Then the synchronizer between the two AHB buses (the D1-D3 interconnect) may introduce some delay, too. AFAIK ST does not publish much details on these.

JW

flyer31
Senior

Your sample code does not show the clock configuration code, neither cache...

Did you try different caching options in H7 (DCACHE/ICACHE)?

(If you run loops like "for" or "while" in STM32H7, the excecution speed without ICACHE will change EXTREMELY, depending whether the complete code is within a 32 byte Flash page or not - if you use ICACHE, this should be solved).