2017-11-06 07:22 AM
Solved! Go to Solution.
2017-11-09 02:38 AM
Just to give another view on this. If you use the code which was suggested above.
while (1)
{ GPIOC->ODR = GPIO_PIN_9;GPIOC->ODR = 0;
}
It will assemble into simple STR Rx, [Ry], where Rx is the value you are writing to the ODR register which is located at address Ry. According to Technical Reference Manual Cortex M7_r0p2.pdf from ARM in section 3.3.3 Load/store timings, this instruction is single cycle. So you might expect to toggle the pin at half of the SystemClock frequency which is indeed 108 MHz. At the end of while loop there will be an unconditional branch to the beginning of while cycle. This takes 1 + P cycles, where P ranges from 1 to 3 depending on the alignment and width of the target instruction, and whether the processor manages to speculate the address early. On F4 devices you would probably see a discontinuity when this happens. On F7 you might not because of the Superscalar nature of the core. Check the following simple benchmark which was executed on F4 and F7
C code
for (n = 0; n < NUM_SAMPLES; n++)
{acc += array1[n];}Assembly
??main_1:
LDR R2,[R0], #+4 // load the next value from array1ADDS R4,R2,R4 // add it to accumulatorSUBS R1,R1,#+1 // increment loop counterBNE.N ??main_1 // loop backcore Cycles for 500 iterations Cycles for 1iteration
M7 1024 2.048M4 3006 6.012As was already said, it is important that the caches are enabled to compensate for slow flash access. Either the ART caches when ITCM is used or Core caches when AXI bus is used.
Best regards,
Martin
STMicroelectronics, Microcontroller Application Support Engineer
2017-11-09 06:43 AM
You could use the BSRR register:
while(1)
{
GPIOC->BSRR= (1 << 9); // Set bit 9
GPIOC->BSRR = (1 << (9+16)); // Reset bit 9
}
Another way to optimize is the use of bit-banding, because it doesn't affect another pins of the same port.
__no_init volatile unsigned int GPIOC_ODR_9 @ 0x424102A4;
while(1)
{
GPIOC_ODR_9 = 1;
GPIOC_ODR_9 = 0;
}
2017-11-09 07:37 AM
Interesting
What does __no_init volatile unsigned int GPIOC_ODR_9 @ 0x424102A4; do? Especially __no_init and @ 0x424102A4;
I have seen __commands in Windows programming but forgotten their meaning.
And @ 0x424102A4 is this CPU scpecific, and what does it do.
Regards
Leif M
2017-11-09 07:55 AM
That declaration is a no-ANSI extension in some compiler, IAR I guess.
0x424102A4 is bit-banded alias of 0x40020815 bit 1 (or, better, 0x40020814. bit 9). 0x40020800 is base address of GPIOC in 'F2/'F4 (maybe some other families but certainly not all), 0x14 is offset of ODR register.
I wouldn't use bit-banding for GPIO, though; the BSRR mechanism is superior in every respect.
JW
2017-11-09 08:55 AM
I think I also mentioned the BSRR approach.
>>
Another way to optimize is the use of bit-banding, because it doesn't affect another pins of the same port.
Optimize? You have 4 bus transactions in the loop.
Bit-banding is not efficient, in the peripheral space it frequently induces hazards. ie TIM->SR
If you can get the address in a register, and outside the loop, the compiler might do a better job even with optimization off.
uint32_t *bsrr = (
uint32_t *
)&GPIOC->BSRR;
while(1)
{
*bsrr = (1 << 9); // Set bit 9
*bsrr
= (1 << (9+16)); // Reset bit 9}
You want the generated code in the form here to be reasonable optimal
loop
str r1,[r0, ♯ 0]
str r2,[r0, ♯ 0]
str r1,[r0, ♯ 0]
str r2,[r0, ♯ 0]
str r1,[r0, ♯ 0]
str r2,[r0, ♯ 0]
str r1,[r0, ♯ 0]
str r2,[r0, ♯ 0]
..
b loop
2017-11-09 09:06 AM
But most importantly bit-banding is not present on M7 core. It is only present on M3 and M4 cores. You will not find it on M0 and M0+ either.
2017-11-09 05:23 PM
Benchmarking is fun, huh? :)
http://www.efton.sk/STM32/r.png
is waveform for pin toggle using BSRR with code (USE_BB was commented). Top track is EVENTOUT i.e. pulses imposed by SEV, middle track is the output in question, bottom track is system clock (default 16MHz in a 'F407 - it was on a DISCO-F4).http://www.efton.sk/STM32/r_bb.png
is waveform for the same pin toggle using bit-banding, code is the same with USE_BB uncommented, tracks arrangement is the same. Note, that the processor, attempting the second write,. had to wait until the first write was 'executed' by the bit-band insert (seen on the delay between second and third SEV pulse) - and as a consequence, the resulting output pulse is longer.Just for fun,
http://www.efton.sk/STM32/r3_DISDEFWBUF.png
is comparison of the waveform resulting from the very same first (BSRR-using) code, except that for the lower waveform-set the processor write buffer has been switched off (using DISDEFWBUF bit in SCB_ACTLR).Thread discussing this
https://list.hw.cz//pipermail/hw-list/2013-April/438309.html
, but it's lengthy, with a lots of detours, and probably only Martin will appreciate the language ;) One maybe surprising result of these experiments was the wisdom that one can't programmatically create a 2-cycle pulse... (discussed in the very last post of that thread).JW