Skip to main content
HTajb.1
Associate III
August 27, 2020
Solved

How to toggle a GPIO pin very fast in STM32H750?

  • August 27, 2020
  • 11 replies
  • 14035 views

Dera Sir/Madam,

I chose STM32H750VBT6(LQFP-100 package-Revision Y-Max core clock=400MHz) for my project,based on my requirements .

Everything was fine,until I perceived that: this magnificent MCU is not capable of toggling a GPIO pin faster than 30 ns(roughly estimation).I should emphasize that to reach this speed I had to ignore using HAL_GPIO_WritePin&HAL_GPIO_TogglePin and use GPIOx->BSRR directly in my program in a very very simple loop . 

If I can decrease this time to 15 ns(or less),I will reach my goal .

I searched on the Internet and found a very useful discussion in this link:

https://community.st.com/s/question/0D50X00009cdgGjSAI/stm32h7xx-fast-gpio-toggle

It seems that the H7 series has a limit to do so ,intrinsically.

Here are my questions:

1-If I use assembly language directly,can I reach this speed (15 ns)?

2-If I use Revision V of this chip(LQFP-100 package),will it solve my problem(Considering internal HW differences with Revision Y,maybe)?

3-Necessary settings were done in my program(Enabling Cell compensation and...),but none of them were useful.Since this chip is still new to me,maybe I don't know about the right settings or the sequence of the right settings. Can anybody help me about this ?

Your consideration is highly appreciated.

This topic has been closed for replies.
Best answer by waclawek.jan

Welcome to the wonderful world of 32 bits.

In 'H7, GPIO is at a bus clocked max. 240MHz, removed from processor through 3 (three) busmatrices. This is not your tightly coupled 8-bitter with clean and straightforward timing, anymore. In fact, despite optically 2 orders of magnitude faster, the "control granularity" remained roughly the same, only number crunching capability is increasing.

For control, you have to resort to hardware. Use a timer to toggle a pin.

JW

11 replies

KnarfB
Super User
August 27, 2020

I'm not asking why you want to do that. But: is your code executed in RAM? If not, you should consider that. A release build using STM32CubeIDE (gcc) with -Ofast (and -g1) makes a single machine instruction for each toggle like:

114 	 LED_GPIO_Port->BSRR = LED_Pin;
080002fc: movs r3, #8
080002fe: ldr r2, [pc, #12] ; (0x800030c <main+164>)
08000300: str r3, [r2, #24]
115 	 LED_GPIO_Port->BRR = LED_Pin;
08000302: str r3, [r2, #40] ; 0x28
08000304: b.n 0x8000300 <main+152>

no advantage going assembler lane.

TDK
August 28, 2020

I tried this and am a bit unimpressed at the speed.

I'm running at 480 MHz clock speed. STM32H743 board. Instruction cache enabled.

I wrote the following code:

while (1) {
 GPIOC->BSRR = (1 << 10);
 GPIOC->BSRR = (1 << 26);
 GPIOC->BSRR = (1 << 10);
 GPIOC->BSRR = (1 << 26);
 GPIOC->BSRR = (1 << 10);
 GPIOC->BSRR = (1 << 26);
 GPIOC->BSRR = (1 << 10);
 GPIOC->BSRR = (1 << 26);
 GPIOC->BSRR = (1 << 10);
 GPIOC->BSRR = (1 << 26);
 GPIOC->BSRR = (1 << 10);
 GPIOC->BSRR = (1 << 26);
 }

This got translated to the following asssembly:

> 0x8009178 <main_cm7()+168>:	ldr	r3, [pc, #36]	; (0x80091a0 <main_cm7()+208>)
 0x800917a <main_cm7()+170>:	mov.w	r1, #1024	; 0x400
 0x800917e <main_cm7()+174>:	mov.w	r2, #67108864	; 0x4000000
 0x8009182 <main_cm7()+178>:	str	r1, [r3, #24]
 0x8009184 <main_cm7()+180>:	str	r2, [r3, #24]
 0x8009186 <main_cm7()+182>:	str	r1, [r3, #24]
 0x8009188 <main_cm7()+184>:	str	r2, [r3, #24]
 0x800918a <main_cm7()+186>:	str	r1, [r3, #24]
 0x800918c <main_cm7()+188>:	str	r2, [r3, #24]
 0x800918e <main_cm7()+190>:	str	r1, [r3, #24]
 0x8009190 <main_cm7()+192>:	str	r2, [r3, #24]
 0x8009192 <main_cm7()+194>:	str	r1, [r3, #24]
 0x8009194 <main_cm7()+196>:	str	r2, [r3, #24]
 0x8009196 <main_cm7()+198>:	str	r1, [r3, #24]
 0x8009198 <main_cm7()+200>:	str	r2, [r3, #24]
 0x800919a <main_cm7()+202>:	b.n	0x8009182 <main_cm7()+178>

So, one str instruction per line.

The result is a PWM at around 20 MHz

0693W000002lbkIQAQ.png

So no, you can't reach 15ns. Lowest you can get is around 25ns. At least on the chip that I have.

Of course, you can get way higher through other methods.

"If you feel a post has answered your question, please click ""Accept as Solution""."
waclawek.jan
waclawek.janBest answer
Super User
August 28, 2020

Welcome to the wonderful world of 32 bits.

In 'H7, GPIO is at a bus clocked max. 240MHz, removed from processor through 3 (three) busmatrices. This is not your tightly coupled 8-bitter with clean and straightforward timing, anymore. In fact, despite optically 2 orders of magnitude faster, the "control granularity" remained roughly the same, only number crunching capability is increasing.

For control, you have to resort to hardware. Use a timer to toggle a pin.

JW

HTajb.1
HTajb.1Author
Associate III
August 28, 2020

Hello KnarfB,TDK,waclawek.jan: 

Thank you so much for your time and your attention to reply to my questions.

In my project,I need to communicate to the other chips by a protocol like: SPI(NOT exactly the SPI).So,I need to transfer data serially to them(STM32 is the master) and to achieve this,I have to use GPIOx->BSRR somehow,I suppose. 

To sum up:

The core of STM32H750 is not directly connected to the GPIO pins and there are 3 (three) bus-matrices between them,and that is the origin of the delay and If I transfer my codes to the internal RAM,to be executed by CPU(at zero wait states),that will be useless,too .

(right?)

You know, I studied a lot and chose STM32H750,ultimately .Even I tried to estimate the processing power of this chip with a simple multiplication(5 digits numbers) in a loop.You won't believe:the loop counter was extremely high and still, this chip did the processing in the blink of an eye ! (I think the FPU was responsible of the processing,in this case).

I think I have to choose another family from STM32,the most probable,the new chip will be: STM32F767VIT6(LQFP-100 package).If I select this chip,could you(or one the ST employees) confirm this will be the solution for me?

I really appreciate your attention.

KnarfB
Super User
August 28, 2020

You can write to GPIOx->BSRR using a DMA transfer too. When using the MCU, the timing will not be deterministic, especially when interrupts, caching etc. come into play. I would DMA give a try before changing horses.

TDK
August 28, 2020

Turns out that's even more limited than the CPU transfer. This is the fastest I could get using TIM1 to trigger a DMA transfer to BSRR:

0693W000002lglaQAA.png

The DMA transfer error gets set as well, due to the underrun.

Theoretically, BDMA should be much quicker at this but I don't see a way to set a periodic trigger for the transfer.

"If you feel a post has answered your question, please click ""Accept as Solution""."
waclawek.jan
Super User
August 28, 2020

>Turns out [DMA is] even more limited than the CPU transfer.

The write from DMA has to traverse almost the same bus matrix structure than write from CPU (okay, somewhat less, but on the slower side of the chip), plus DMA has to perform the memory-side read and arbitration. So raw back-to-back write speed is less.

Latencies in real program may be better than toggling pin in interrupt, that's the real value of using DMA.

> or one the ST employees confirm

This is a primarily user-driven forum, with only casual ST presence. You may want to contact ST directly, through web support form, out through FAE.

But first, you should maybe try yourself on a Nucleo board, F7 or maybe even F4. After all, the real exercise is not about just toggling a pin, is it. There may be more surprises laying ahead, as you want to use the chip in a way different than it is intended to be used to. At the end of the day, you would maybe want to reconsider your expectations.

JW

HTajb.1
HTajb.1Author
Associate III
September 8, 2020

Dear experts: KnarfB,TDK,waclawek.jan,

Thanks again to reply to my questions.

For your further info,it should be mentioned that: I transferred some parts of my

application to the ITCM RAM(As KnarfB suggested),but the result was the same .

(Cause I already enabled the "instruction cache" in the previous application ,I suppose.)

Also,I refer to this discussion for those ,who may face such a problem in the future :

https://community.st.com/s/question/0D50X00009XkWN7SAN/stm32h7-gpio-togle-max-frequency

Regards,

HT

December 19, 2020

Hi,

I also got only 20MHz on an STM32H743VIT6 (480MHz), using a DSO1511e oscilloscope (120MHz bandwidth).

Code:

	 GPIOA->ODR = 0xFFFF0000;
 
	 GPIOA->ODR = 0x0000FFFF;

I noticed that the probe that comes with this oscilloscope is of poor quality.

So I did other tests. With an STM32F407VG running at 72MHz, GPIO did 14.3MHz. And with 168MHz the oscilloscope did not read correctly.

I replaced the oscilloscope probe with another one that I had purchased for 60MHz. It was now possible to read 33MHz on the F407 @ 168MHz clock.

I believe that the probe has such a low bandwidth that it makes the oscilloscope read only some harmonics instead of the main frequency, so there are several people quoting on the internet that are reading such a low frequency on the STM32F pins or STM32H.

Before going to test the pin frequency, it is interesting to test the measuring equipment with a reliable signal source.

It does not make sense to have a pin frequency of x MHz clocked at 72MHz and to have a lower pin frequency when the clock frequency of the uC (microcontroller) is higher.

If the measuring equipment has not been proven to be tested, I believe that a good idea would be to increase the uC clock frequency gradually, start with a frequency that the measuring equipment can read without instability, then gradually increase, when the reading becomes unstable, it is likely that the equipment's bandwidth cannot perform the service.

December 19, 2020

I went back to testing the STM32H7 after changing the oscilloscope probe.

But to my sadness it really only comes out at 20MHz on the PA0 pin clocked at 480MHz, and at 72MHz clocked out only 3MHz.

It is unfortunate that a cannon (STM32H7) cannot win a bazooka (STM32F4).

I bought the H7 to have better pin frequency performance, but now I saw that it was a terrible purchase, lost money.

TDK
December 19, 2020
Why are you choosing micros based on how fast you can toggle a pin in a main loop?
"If you feel a post has answered your question, please click ""Accept as Solution""."
December 19, 2020

Because it is the most generic form of testing. Many people use software to perform GPIO manipulation, and this low speed will directly impact expected performance. Few people develop software at a low level.

What would be your suggestion for a better way to test the maximum frequency of a GPIO pin (General Purpose Input / Output)?

Uwe Bonnes
Chief
December 19, 2020

On the F723, bitbanging for SWD can get into the > 20 MHz rate during the transfer. Not just toggling, but doing something usefull. Look at the StlinkV3.

AOliv.1
Associate
December 19, 2020

https://community.st.com/s/question/0D53W00000RRSs5SAH/internal-ram-serving-like-eprom-emulator

STM32F417 more fast to pin manipulation than STM32H750. Disappointed