2019-05-11 06:18 AM
I have two nucleo-144 boards, stm32F767zi and stm32H743zi. Both cortex-M7, pin compatible, and pretty close in AHBx (GPIO) clock, 216 vs 200 MHz.
Testing TFT LCD display library, that required fast 8-bits access to the LCD data bus, I discovered that the same code running on two boards shows almost 2x times difference in timing.
Specifics of interfacing TFT shield (arduino UNO form factor) to nucleo-144 is that random bit patterns generated on 3 GPIO ports simultaneously.
Here is the test code:
#define REGS(x) x
#if defined(STM32F767xx)
#define GPIO_INIT() { RCC->AHB1ENR |= RCC_AHB1ENR_GPIOAEN | RCC_AHB1ENR_GPIOCEN | RCC_AHB1ENR_GPIODEN | RCC_AHB1ENR_GPIOEEN | RCC_AHB1ENR_GPIOFEN; }
#elif defined(STM32H743xx)
#define GPIO_INIT() { RCC->AHB4ENR |= RCC_AHB4ENR_GPIOAEN | RCC_AHB4ENR_GPIOCEN | RCC_AHB4ENR_GPIODEN | RCC_AHB4ENR_GPIOEEN | RCC_AHB4ENR_GPIOFEN; }
#endif
// configure macros for the data pins
#define DMASK ((1<<15)) //#1
#define EMASK ((1<<13)|(1<<11)|(1<<9)) //#3, #5, #6
#define FMASK ((1<<12)|(1<<15)|(1<<14)|(1<<13)) //#0, #2, #4, #7
#define write_8(d) { \
GPIOD->REGS(BSRR) = DMASK << 16; \
GPIOE->REGS(BSRR) = EMASK << 16; \
GPIOF->REGS(BSRR) = FMASK << 16; \
GPIOD->REGS(BSRR) = ( ((d) & (1<<1)) << 14); \
GPIOE->REGS(BSRR) = ( ((d) & (1<<3)) << 10) \
| (((d) & (1<<5)) << 6) \
| (((d) & (1<<6)) << 3); \
GPIOF->REGS(BSRR) = ( ((d) & (1<<0)) << 12) \
| (((d) & (1<<2)) << 13) \
| (((d) & (1<<4)) << 10) \
| (((d) & (1<<7)) << 6); \
}
// PD15 PE13,PE11,PE9 PF15,PF14,PF13,PF12
#define setWriteDir() { setReadDir(); \
GPIOD->MODER |= 0x40000000; GPIOE->MODER |= 0x04440000; GPIOF->MODER |= 0x55000000; }
#define setReadDir() { GPIOD->MODER &= ~0xC0000000; GPIOE->MODER &= ~0x0CCC0000; GPIOF->MODER &= ~0xFF000000; }
volatile uint32_t temp1, temp2;
void setup()
{
GPIO_INIT();
setWriteDir();
/*
__HAL_RCC_CSI_ENABLE() ;
__HAL_RCC_SYSCFG_CLK_ENABLE() ;
HAL_EnableCompensationCell();
*/
}
void loop()
{
// H743 = 2.777 MHz
// F767 = 4.908 MHz
temp1 = 0xAAAAAAAA;
temp2 = 0x55555555;
for( uint32_t i = 0; i < 0x1000000UL; i++) {
write_8(temp1);
write_8(temp2);
}
}
Compiled in arduino IDE 1.8.9.
The question is Why stm32H743 is slower than F767?
Is it because GPIO bus transferred to AHB4 (F767 -> AHB1) or I'm missing something? Any thoughts?
2019-05-12 12:07 AM
In the 'H7, first, there's conversion from AXI to AHB in the D1 matrix, and that may introduce some delay. In 'F7, the AHB buses are served by a dedicated AHBP port of the processor, bypassing AXI (in 'H7, this port is used exclusively to connect D2 peripherals). Then the synchronizer between the two AHB buses (the D1-D3 interconnect) may introduce some delay, too. AFAIK ST does not publish much details on these.
JW
2019-05-16 11:06 PM
Your sample code does not show the clock configuration code, neither cache...
Did you try different caching options in H7 (DCACHE/ICACHE)?
(If you run loops like "for" or "while" in STM32H7, the excecution speed without ICACHE will change EXTREMELY, depending whether the complete code is within a 32 byte Flash page or not - if you use ICACHE, this should be solved).