Toggling a GPIO pin at 50 + MHz

con3 · ‎2017-11-06

Posted on November 06, 2017 at 16:22

The original post was too long to process during our migration. Please click on the attachment to read the original post.

Martin HUBIK · ‎2017-11-09

Posted on November 09, 2017 at 11:38

Just to give another view on this. If you use the code which was suggested above.

while (1)

{

GPIOC->ODR = GPIO_PIN_9;

GPIOC->ODR = 0;

}

It will assemble into simple STR Rx, [Ry], where Rx is the value you are writing to the ODR register which is located at address Ry. According to Technical Reference Manual Cortex M7_r0p2.pdf from ARM in section 3.3.3 Load/store timings, this instruction is single cycle. So you might expect to toggle the pin at half of the SystemClock frequency which is indeed 108 MHz. At the end of while loop there will be an unconditional branch to the beginning of while cycle. This takes 1 + P cycles, where P ranges from 1 to 3 depending on the alignment and width of the target instruction, and whether the processor manages to speculate the address early. On F4 devices you would probably see a discontinuity when this happens. On F7 you might not because of the Superscalar nature of the core. Check the following simple benchmark which was executed on F4 and F7

C code

for (n = 0; n < NUM_SAMPLES; n++)

{

acc += array1[n];

}

Assembly

??main_1:

LDR R2,[R0], #+4 // load the next value from array1

ADDS R4,R2,R4 // add it to accumulator

SUBS R1,R1,#+1 // increment loop counter

BNE.N ??main_1 // loop back

core Cycles for 500 iterations Cycles for 1iteration

M7 1024 2.048

M4 3006 6.012

As was already said, it is important that the caches are enabled to compensate for slow flash access. Either the ART caches when ITCM is used or Core caches when AXI bus is used.

Best regards,

Martin

STMicroelectronics, Microcontroller Application Support Engineer

alexandre239955_stm1 · ‎2017-11-09

Posted on November 09, 2017 at 14:43

You could use the BSRR register:

while(1)

{

GPIOC->BSRR= (1 << 9); // Set bit 9

GPIOC->BSRR = (1 << (9+16)); // Reset bit 9

}

Another way to optimize is the use of bit-banding, because it doesn't affect another pins of the same port.

__no_init volatile unsigned int GPIOC_ODR_9 @ 0x424102A4;

while(1)

{

GPIOC_ODR_9 = 1;

GPIOC_ODR_9 = 0;

}

LMI2 · ‎2017-11-09

Posted on November 09, 2017 at 15:37

Interesting

What does __no_init volatile unsigned int GPIOC_ODR_9 @ 0x424102A4; do? Especially __no_init and @ 0x424102A4;

I have seen __commands in Windows programming but forgotten their meaning.

And @ 0x424102A4 is this CPU scpecific, and what does it do.

Regards

Leif M

waclawek.jan · ‎2017-11-09

Posted on November 09, 2017 at 15:55

That declaration is a no-ANSI extension in some compiler, IAR I guess.

0x424102A4 is bit-banded alias of 0x40020815 bit 1 (or, better, 0x40020814. bit 9). 0x40020800 is base address of GPIOC in 'F2/'F4 (maybe some other families but certainly not all), 0x14 is offset of ODR register.

I wouldn't use bit-banding for GPIO, though; the BSRR mechanism is superior in every respect.

JW

Tesla DeLorean · ‎2017-11-09

Posted on November 09, 2017 at 16:55

I think I also mentioned the BSRR approach.

>>

Another way to optimize is the use of bit-banding, because it doesn't affect another pins of the same port.

Optimize? You have 4 bus transactions in the loop.

Bit-banding is not efficient, in the peripheral space it frequently induces hazards. ie TIM->SR

If you can get the address in a register, and outside the loop, the compiler might do a better job even with optimization off.

uint32_t *bsrr = (

uint32_t *

)&

GPIOC->BSRR;

while(1)

{

*bsrr = (1 << 9); // Set bit 9

*bsrr

= (1 << (9+16)); // Reset bit 9

}

You want the generated code in the form here to be reasonable optimal

loop

str r1,[r0, ♯ 0]

str r2,[r0, ♯ 0]

str r1,[r0, ♯ 0]

str r2,[r0, ♯ 0]

str r1,[r0, ♯ 0]

str r2,[r0, ♯ 0]

str r1,[r0, ♯ 0]

str r2,[r0, ♯ 0]

..

b loop

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Martin HUBIK · ‎2017-11-09

Posted on November 09, 2017 at 17:06

But most importantly bit-banding is not present on M7 core. It is only present on M3 and M4 cores. You will not find it on M0 and M0+ either.

waclawek.jan · ‎2017-11-09

Posted on November 10, 2017 at 01:23

Benchmarking is fun, huh? :)

http://www.efton.sk/STM32/r.png

is waveform for pin toggle using BSRR with code

http://www.efton.sk/STM32/r.c

(USE_BB was commented). Top track is EVENTOUT i.e. pulses imposed by SEV, middle track is the output in question, bottom track is system clock (default 16MHz in a 'F407 - it was on a DISCO-F4).

http://www.efton.sk/STM32/r_bb.png

is waveform for the same pin toggle using bit-banding, code is the same with USE_BB uncommented, tracks arrangement is the same. Note, that the processor, attempting the second write,. had to wait until the first write was 'executed' by the bit-band insert (seen on the delay between second and third SEV pulse) - and as a consequence, the resulting output pulse is longer.

Just for fun,

http://www.efton.sk/STM32/r3_DISDEFWBUF.png

is comparison of the waveform resulting from the very same first (BSRR-using) code, except that for the lower waveform-set the processor write buffer has been switched off (using DISDEFWBUF bit in SCB_ACTLR).

Thread discussing this

https://list.hw.cz//pipermail/hw-list/2013-April/438309.html

, but it's lengthy, with a lots of detours, and probably only Martin will appreciate the language ;) One maybe surprising result of these experiments was the wisdom that one can't programmatically create a 2-cycle pulse... (discussed in the very last post of that thread).

JW