cancel
Showing results for 
Search instead for 
Did you mean: 

How to toggle a GPIO pin very fast in STM32H750?

HTajb.1
Associate III

Dera Sir/Madam,

I chose STM32H750VBT6(LQFP-100 package-Revision Y-Max core clock=400MHz) for my project,based on my requirements .

Everything was fine,until I perceived that: this magnificent MCU is not capable of toggling a GPIO pin faster than 30 ns(roughly estimation).I should emphasize that to reach this speed I had to ignore using HAL_GPIO_WritePin&HAL_GPIO_TogglePin and use GPIOx->BSRR directly in my program in a very very simple loop . 

If I can decrease this time to 15 ns(or less),I will reach my goal .

I searched on the Internet and found a very useful discussion in this link:

https://community.st.com/s/question/0D50X00009cdgGjSAI/stm32h7xx-fast-gpio-toggle

It seems that the H7 series has a limit to do so ,intrinsically.

Here are my questions:

1-If I use assembly language directly,can I reach this speed (15 ns)?

2-If I use Revision V of this chip(LQFP-100 package),will it solve my problem(Considering internal HW differences with Revision Y,maybe)?

3-Necessary settings were done in my program(Enabling Cell compensation and...),but none of them were useful.Since this chip is still new to me,maybe I don't know about the right settings or the sequence of the right settings. Can anybody help me about this ?

Your consideration is highly appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions

Welcome to the wonderful world of 32 bits.

In 'H7, GPIO is at a bus clocked max. 240MHz, removed from processor through 3 (three) busmatrices. This is not your tightly coupled 8-bitter with clean and straightforward timing, anymore. In fact, despite optically 2 orders of magnitude faster, the "control granularity" remained roughly the same, only number crunching capability is increasing.

For control, you have to resort to hardware. Use a timer to toggle a pin.

JW

View solution in original post

19 REPLIES 19
KnarfB
Principal III

I'm not asking why you want to do that. But: is your code executed in RAM? If not, you should consider that. A release build using STM32CubeIDE (gcc) with -Ofast (and -g1) makes a single machine instruction for each toggle like:

114       	  LED_GPIO_Port->BSRR = LED_Pin;
080002fc:   movs    r3, #8
080002fe:   ldr     r2, [pc, #12]   ; (0x800030c <main+164>)
08000300:   str     r3, [r2, #24]
115       	  LED_GPIO_Port->BRR = LED_Pin;
08000302:   str     r3, [r2, #40]   ; 0x28
08000304:   b.n     0x8000300 <main+152>

no advantage going assembler lane.

TDK
Guru

I tried this and am a bit unimpressed at the speed.

I'm running at 480 MHz clock speed. STM32H743 board. Instruction cache enabled.

I wrote the following code:

while (1) {
    GPIOC->BSRR = (1 << 10);
    GPIOC->BSRR = (1 << 26);
    GPIOC->BSRR = (1 << 10);
    GPIOC->BSRR = (1 << 26);
    GPIOC->BSRR = (1 << 10);
    GPIOC->BSRR = (1 << 26);
    GPIOC->BSRR = (1 << 10);
    GPIOC->BSRR = (1 << 26);
    GPIOC->BSRR = (1 << 10);
    GPIOC->BSRR = (1 << 26);
    GPIOC->BSRR = (1 << 10);
    GPIOC->BSRR = (1 << 26);
  }

This got translated to the following asssembly:

> 0x8009178 <main_cm7()+168>:	ldr	r3, [pc, #36]	; (0x80091a0 <main_cm7()+208>)
   0x800917a <main_cm7()+170>:	mov.w	r1, #1024	; 0x400
   0x800917e <main_cm7()+174>:	mov.w	r2, #67108864	; 0x4000000
   0x8009182 <main_cm7()+178>:	str	r1, [r3, #24]
   0x8009184 <main_cm7()+180>:	str	r2, [r3, #24]
   0x8009186 <main_cm7()+182>:	str	r1, [r3, #24]
   0x8009188 <main_cm7()+184>:	str	r2, [r3, #24]
   0x800918a <main_cm7()+186>:	str	r1, [r3, #24]
   0x800918c <main_cm7()+188>:	str	r2, [r3, #24]
   0x800918e <main_cm7()+190>:	str	r1, [r3, #24]
   0x8009190 <main_cm7()+192>:	str	r2, [r3, #24]
   0x8009192 <main_cm7()+194>:	str	r1, [r3, #24]
   0x8009194 <main_cm7()+196>:	str	r2, [r3, #24]
   0x8009196 <main_cm7()+198>:	str	r1, [r3, #24]
   0x8009198 <main_cm7()+200>:	str	r2, [r3, #24]
   0x800919a <main_cm7()+202>:	b.n	0x8009182 <main_cm7()+178>

So, one str instruction per line.

The result is a PWM at around 20 MHz

0693W000002lbkIQAQ.png

So no, you can't reach 15ns. Lowest you can get is around 25ns. At least on the chip that I have.

Of course, you can get way higher through other methods.

If you feel a post has answered your question, please click "Accept as Solution".

Welcome to the wonderful world of 32 bits.

In 'H7, GPIO is at a bus clocked max. 240MHz, removed from processor through 3 (three) busmatrices. This is not your tightly coupled 8-bitter with clean and straightforward timing, anymore. In fact, despite optically 2 orders of magnitude faster, the "control granularity" remained roughly the same, only number crunching capability is increasing.

For control, you have to resort to hardware. Use a timer to toggle a pin.

JW

HTajb.1
Associate III

Hello KnarfB,TDK,waclawek.jan: 

Thank you so much for your time and your attention to reply to my questions.

In my project,I need to communicate to the other chips by a protocol like: SPI(NOT exactly the SPI).So,I need to transfer data serially to them(STM32 is the master) and to achieve this,I have to use GPIOx->BSRR somehow,I suppose. 

To sum up:

The core of STM32H750 is not directly connected to the GPIO pins and there are 3 (three) bus-matrices between them,and that is the origin of the delay and If I transfer my codes to the internal RAM,to be executed by CPU(at zero wait states),that will be useless,too .

(right?)

You know, I studied a lot and chose STM32H750,ultimately .Even I tried to estimate the processing power of this chip with a simple multiplication(5 digits numbers) in a loop.You won't believe:the loop counter was extremely high and still, this chip did the processing in the blink of an eye ! (I think the FPU was responsible of the processing,in this case).

I think I have to choose another family from STM32,the most probable,the new chip will be: STM32F767VIT6(LQFP-100 package).If I select this chip,could you(or one the ST employees) confirm this will be the solution for me?

I really appreciate your attention.

You can write to GPIOx->BSRR using a DMA transfer too. When using the MCU, the timing will not be deterministic, especially when interrupts, caching etc. come into play. I would DMA give a try before changing horses.

Turns out that's even more limited than the CPU transfer. This is the fastest I could get using TIM1 to trigger a DMA transfer to BSRR:

0693W000002lglaQAA.png

The DMA transfer error gets set as well, due to the underrun.

Theoretically, BDMA should be much quicker at this but I don't see a way to set a periodic trigger for the transfer.

If you feel a post has answered your question, please click "Accept as Solution".

>Turns out [DMA is] even more limited than the CPU transfer.

The write from DMA has to traverse almost the same bus matrix structure than write from CPU (okay, somewhat less, but on the slower side of the chip), plus DMA has to perform the memory-side read and arbitration. So raw back-to-back write speed is less.

Latencies in real program may be better than toggling pin in interrupt, that's the real value of using DMA.

> or one the ST employees confirm

This is a primarily user-driven forum, with only casual ST presence. You may want to contact ST directly, through web support form, out through FAE.

But first, you should maybe try yourself on a Nucleo board, F7 or maybe even F4. After all, the real exercise is not about just toggling a pin, is it. There may be more surprises laying ahead, as you want to use the chip in a way different than it is intended to be used to. At the end of the day, you would maybe want to reconsider your expectations.

JW

HTajb.1
Associate III

Dear experts: KnarfB,TDK,waclawek.jan,

Thanks again to reply to my questions.

For your further info,it should be mentioned that: I transferred some parts of my

application to the ITCM RAM(As KnarfB suggested),but the result was the same .

(Cause I already enabled the "instruction cache" in the previous application ,I suppose.)

Also,I refer to this discussion for those ,who may face such a problem in the future :

https://community.st.com/s/question/0D50X00009XkWN7SAN/stm32h7-gpio-togle-max-frequency

Regards,

HT

Not applicable

Hi,

I also got only 20MHz on an STM32H743VIT6 (480MHz), using a DSO1511e oscilloscope (120MHz bandwidth).

Code:

	  GPIOA->ODR = 0xFFFF0000;
 
	  GPIOA->ODR = 0x0000FFFF;

I noticed that the probe that comes with this oscilloscope is of poor quality.

So I did other tests. With an STM32F407VG running at 72MHz, GPIO did 14.3MHz. And with 168MHz the oscilloscope did not read correctly.

I replaced the oscilloscope probe with another one that I had purchased for 60MHz. It was now possible to read 33MHz on the F407 @ 168MHz clock.

I believe that the probe has such a low bandwidth that it makes the oscilloscope read only some harmonics instead of the main frequency, so there are several people quoting on the internet that are reading such a low frequency on the STM32F pins or STM32H.

Before going to test the pin frequency, it is interesting to test the measuring equipment with a reliable signal source.

It does not make sense to have a pin frequency of x MHz clocked at 72MHz and to have a lower pin frequency when the clock frequency of the uC (microcontroller) is higher.

If the measuring equipment has not been proven to be tested, I believe that a good idea would be to increase the uC clock frequency gradually, start with a frequency that the measuring equipment can read without instability, then gradually increase, when the reading becomes unstable, it is likely that the equipment's bandwidth cannot perform the service.