cancel
Showing results for 
Search instead for 
Did you mean: 

Very slow GPIO on STM32H743?

Martijn Jasperse
Associate II
Posted on June 19, 2018 at 09:07

Hi all,

I've started playing around with the NUCLEO-H743ZI to try out the H7 series and I've noticed that toggling GPIO output with BSRR is very slow. I've used CUBE to configure the board for a 400MHz SYSCLK and my main loop just continuously toggles PE0 with BSRR as below:

while (1)

{

   TOGGLE_GPIO_Port->BSRRH = TOGGLE_Pin;

   TOGGLE_GPIO_Port->BSRRL = TOGGLE_Pin;

   TOGGLE_GPIO_Port->BSRRH = TOGGLE_Pin;

   TOGGLE_GPIO_Port->BSRRL = TOGGLE_Pin;

}

Resulting scope trace attached - the output toggles at 17MHz (blue), which seems extremely slow. For comparison MCO2 set to SYSCLK/10 (yellow) is 40MHz, and TIM1 set to PWM (purple) can output 100MHz. I've checked my GPIO speeds (GPIO_SPEED_FREQ_VERY_HIGH) and my peripheral clocks.

Has anyone else observed slow GPIO on the H7? Is there a setting I've missed? Or is there a spec I've missed?

For comparison, the same test on my NUCLEO-F767ZI at 216MHz SYSCLK yields 108MHz. I wasn't expecting the H7 to be 6x slower!

Cheers,

Martijn

1 ACCEPTED SOLUTION

Accepted Solutions
David SIORPAES
ST Employee
Posted on June 19, 2018 at 09:33

Hello

Jasperse.Martijn

,

this discussion highlights the reasons behind your observation:

https://community.st.com/0D50X00009XkWN7SAN

View solution in original post

4 REPLIES 4
David SIORPAES
ST Employee
Posted on June 19, 2018 at 09:33

Hello

Jasperse.Martijn

,

this discussion highlights the reasons behind your observation:

https://community.st.com/0D50X00009XkWN7SAN

Posted on June 19, 2018 at 09:51

Thanks for the reference - odd that I didn't see it when I searched. To summarise, this result is 'expected' because the Cortex-M7 is on AXIM and GPIO is on AHB4 so there are two bus connections involved that add latency to each access (RM0433 Table/Figure 1). Is there somewhere in RM0433 or the datasheet that specifies how slow I might expect this to be?

To explain, for my application I need to set a CS line and then immediately read a 16-bit parallel bus. This is easily achievable on the F7 using GPIO with BSRR and IDR, but I cannot think of a way to abuse a peripheral to let me do the same on the H7.

Posted on June 19, 2018 at 10:50

To explain, for my application I need to set a CS line and then immediately read a 16-bit parallel bus. This is easily achievable on the F7 using GPIO with BSRR and IDR, but I cannot think of a way to abuse a peripheral to let me do the same on the H7.

Try timer-driven DMA to read the parallel bus, with the CS being output by a different channel of the same timer (or a chained one).

Generally, higher apparent processing speeds beyond a certain technological limit are (have to be) achieved by adding contraptions rather than just scaling up the raw clock rate. These contraptions then introduce various forms of latencies; and as there are several of them (superscalar execution, pipelining, multilevel caching, (de)parallelizing of various forms) they interact in an incredibly complex form resulting in gradually less and less predictibility and more and more surprising latencies and jitters. Turns out, it may be hard even for the manufacturer to establish hard numbers without firing up a complex (read: lengthy, thus expensive) simulation. And as too many parameters go into each single path's calculation and there are too many scenarios to deal with all of them, this is inevitably handled by handwaving (i.e. there's no 'how bad it can be' in the DS/RM, and never will be).

Thus, at the end of the day, for resemblance of real-time processing you'll need to stick to the simplest hardware available, or do what you'd call as a step back and go for the simpler mcu or dedicated hardware (maybe as a slave - here external; some designs - not STM32, or not yet in the general-purpose line - build asymmetrical multicore chips for this purpose).

JW

Posted on June 19, 2018 at 11:26

Thanks for the input! I think I could construct a solution with TIM-triggered DMA as you suggest (e.g. based on AN4666) but the complexity seems to outweigh the advantage, particularly with respect to maintainability. In this case we will stick with the F7.

Your words of caution regarding surprising latencies and jitters are taken to heart. This shows the value of the NUCLEO/DISCO boards, as a platform to check what drawbacks might exist from the particular combinations of trade-offs made in a given product line before committing too hard.