2013-07-31 06:43 PM
Howdy,
So I enabled DMA1, on STM32F103, copying two 32-bit words (set and clear PF.15) from flash to GPIOF->BSRR at 25kHz. I then run this code with all interrupts disabled while(1) { GPIOF->BSRR = 1 << 14; asm(''nop''); asm(''nop''); GPIOF->BRR = 1 << 14; asm(''nop''); asm(''nop''); }The length of the high pulse on PF.14 is around 175nSecs when DMA is not active. When DMA fires and outputs a high pulse on PF.15, sometimes that pulse width of PF.14 drops to around 120nSecs. I understand that the bus matrix uses round robin scheduling and that sometimes the DMA or CPU will have to block, but I would expect that the blocking would increase a pulse length and not decrease it. What's happening here? Does this has anything to do with out of order execution and that I need to use memory barrier instructions somewhere?
TIA
Andy #gpio-dma2013-08-01 02:14 AM
Much depends on the details like system clock and APB clock divider, but one of the scenarios might be that DMA write occurs concurrently (or just before - the details of arbitration are not described clearly in available material) with the processor writing for the PF14 leading edge, thus the leading edge is delayed, but due to write buffer at the processor/AHB interface the processor continues to run with no waitstates writing for the trailing edge in ''scheduled'' time. If this is the case, you should see the total period unaffected, having prolonged the space between two pulses when the pulse is shortened accordingly. But I believe there might be other scenarios, too.
The SoC character of the vast majority of 32-bitters (i.e. that they are a more-less loose concoction of a processor core, peripherals and interconnection fabric; rather than a microcontroller systematically designed for complete timing control) makes them completely unsuitable for such clock-level timing, to which you might have been accustomed from the 8/16-bitters world. This is the price you pay for both the high execution speed and also the ''high speed'' (short design time) these chips are churned out to the market. JW2013-08-01 08:37 AM
You're analysis makes sense as I did not measured any change in the period. Only some of the pulses that looked like got preempted by the DMA were shorten. What is the best way to prevent this from happening? I guess a read of the same port after the write should force synchronization of the write buffer?
Also the Bus Matrix document says that when more than one bus master ''accesses'' the ''same'' peripheral, then one of them is blocked. Do you know what ''same'' peripheral means? Is APB1 and APB2 consider the same peripheral? Or are they different peripherals?Thanks2013-08-01 09:29 AM
> What is the best way to prevent this from happening?
> I guess a read of the same port after the write should force synchronization of the write buffer?Yes; but that might take longer than two nops - again depending on the particular circumstances.
> Also the Bus Matrix document says Any link to that? Thanks, JW2013-08-01 10:34 AM
>> Also the Bus Matrix document says
>Any link to that? I guess it didn't say ''block''. I was taken the term ''access'' to mean ''blocked''. I was reading ''The Insider's Guide To The STM32'' and the st STM32F103 reference manual. Could the proper way to synchronize this is to use the ''DMB'' instruction?Andy2013-08-01 01:01 PM
Dear Gentlemen,
DMB instruction has no effect on the DMA, it apply only for the Cortex-M3/M4 CPU. The most important point here : GPIOs on our STM32F1 are connected to APB2 Bus, only one master can take the access at a time to that slave either CPU or DMA, in your scenario DMA access to APB2 is longer and CPU was waiting until it frees the bus, then your second GPIO is set/reset. I recommend to have a look on our Application Note on STM32F1 DMA herehttp://www.st.com/web/en/resource/technical/document/application_note/CD00160362.pdf
Good lecture ! Cheers, STOne-32.2013-08-02 06:51 AM
>
DMB instruction has no effect on the DMA, it apply only for the Cortex-M3/M4 CPU.
The correct instruction to use is DSB. The goal here is to ensure the pulse width intended by the two NOPs will not get shorten due to preemption by the DMA. The DSB will do that as in the following code. Now my pulse will only stretch when preempted by the DMA, but never shortens.
while(1) {
GPIOF->BSRR = 1 << 14;
asm(''dsb''); asm(''nop''); GPIOF->BRR = 1 << 14; asm(''dsb''); asm(''nop''); } Andy2013-08-02 08:29 AM
I think ST-One's more broader point is that playing fencing games with the core's write buffers, is somewhat separate from the arbitration/contention on assorted buses, at assorted speeds, and which are occurring independently from the core
Toss in some FSMC accesses, or an obtusely slow APB1 with contention, and you'll get some pretty interesting dynamics. Is there some merit to measuring the jitter on a toggling GPIO, other than illustrating the potentially complex interplay of various subsystems and implementations?2013-08-02 09:01 AM
> I think ST-One's more broader point is that playing fencing games with the core's write
> buffers, is somewhat separate from the arbitration/contention on assorted buses, at assorted > speeds, and which are occurring independently from the core. The fact that the system is complex does not mean it should be handled by a handwave. Yes, I am pointing again to the inadequacy of the publicly available documentation as far as the ST-specific part of the issues mentioned (bus arbitration, out-of-processor buffering). Not that the ARM-specific documentation is concise and complete, but it at least exists. > Toss in some FSMC accesses, or an obtusely slow APB1 with contention, and you'll get some pretty interesting dynamics. Again, that timing is complex, it does not mean it is not deterministic and/or documentable, thus exploitable when needed. In this particular case, it appears, that the DMA unit accesses the APB-connected peripherals (including GPIO) through a single bus (for both APBs), shared with the processor; and there is no other buffer than the single write buffer in the APB bridge (probably one for each bridge). That makes the processor ''see'' the bus contention at its internal write buffer when instructed so by DSB. In the 'F2/'F4 the picture is different, more complex (the DMA units having a separate ''feed'' directly to the APB bridge), and again poorly documented. It's true that GPIOs are on AHB there, so the DSB would heal there this particular problem, too; but two writes to a APB peripheral might arrive closer together than anticipated and there probably a readback would be the only remedy. > Is there some merit to measuring the jitter on a toggling GPIO, other than illustrating the > potentially complex interplay of various subsystems and implementations? Of course there is and you know it (perhaps could be highlighted by renaming ''jitter'' for ''minimum pulse length''). JW2013-08-02 09:56 AM
Yes, I'm tossing some grenades, but this isn't your mothers Z80.
There is practically no amount of documentation you can have that comes close to a gate level simulation of the impact, and interaction of various subsystems, caches, queue, buffers, branch prediction, and prefetch units, and blobs like ART which will alter behaviour based on code placement and flow. I'm not handwaving, this whole pondering on the implications of jamming a GPIO high-and-low, is off in the weeds, and ignores the sensibility of doing so when creating signals in the time domain. It's a time wasting exercise. You want hard, predictable time lines? You use a synchronous timer or counter tasked to do that, not something that takes pseudo-random input and layers it on top.