Why is DMA reducing my pulse width?

bikejog · ‎2013-07-31

Posted on August 01, 2013 at 03:43

Howdy,

So I enabled DMA1, on STM32F103, copying two 32-bit words (set and clear PF.15) from flash to GPIOF->BSRR at 25kHz. I then run this code with all interrupts disabled

while(1) {

GPIOF->BSRR = 1 << 14;

asm(''nop'');

GPIOF->BRR = 1 << 14;

asm(''nop'');

}

The length of the high pulse on PF.14 is around 175nSecs when DMA is not active. When DMA fires and outputs a high pulse on PF.15, sometimes that pulse width of PF.14 drops to around 120nSecs. I understand that the bus matrix uses round robin scheduling and that sometimes the DMA or CPU will have to block, but I would expect that the blocking would increase a pulse length and not decrease it. What's happening here? Does this has anything to do with out of order execution and that I need to use memory barrier instructions somewhere?

TIA

Andy

#gpio-dma

bikejog · ‎2013-08-02

Posted on August 02, 2013 at 19:45

So I guess the only way to do a minimum pulse width is to write GPIO, read GPIO, then NOPs?

Andy

Tesla DeLorean · ‎2013-08-02

Posted on August 02, 2013 at 22:42

In RAM, with no interrupts, etc.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

bikejog · ‎2013-08-02

Posted on August 03, 2013 at 04:42

>> In RAM, with no interrupts, etc

You've lost me. What's in RAM? Code execute out of RAM? And how does interrupt can affect writing/reading to/from GPIO? Can you suggest some reading relating to this topic?

TIA

Andy

bikejog · ‎2013-08-04

Posted on August 04, 2013 at 14:49

I found the Arm documentation where it states that you can't use NOPs for timing. My questions then are

1. Are APB1 and APB2 considered to be the same slave? If one master access APB1 and another access APB2 at the same time, would those requests be queued or would both requests be handled simultaneously?

2. If I do a write to any peripheral register (including GPIO), then I read the same peripheral (maybe a different register but same peripheral) back, is it guaranteed that the read will not execute until the write completes and that the next instruction (sans NOP) following the read instruction would not be executed until the read completes?

3. How many cycles does it cost to read/write from/to the bus matrix and the peripherals?

TIA

Andy

waclawek.jan · ‎2013-08-05

Posted on August 05, 2013 at 10:47

> I'm tossing some grenades,

No worry, some deep trenches here... ;)

> but this isn't your mothers Z80.

No; but it isn't any rocket science either.

> There is practically no amount of documentation...

I beg to disagree. For most of the practical purposes, much less information is needed, but it has to be concise and presented in an organized way. Unfortunately, these days it appears that the art of documentation writing is mostly replaced by extensive copy-paste-ing and heavy use of wordprocessor templates, inserting inordinate amounts of whitespace to impress the audience by page numbers.

I am paying for every transistor in the chip, so I want to know what they are doing.

> [...] this whole pondering on the implications of jamming a GPIO high-and-low

Issuing a valid latch pulse of a minimum few tens of ns duration is IMO one of the most usual tasks of a microcontroller in a common mcu-based system. That's, of course, not the only place where timing is involved. Setup and hold times, various sampling timing, and who knows what else, are all examples of where instruction-timed sequencing might be desirable. Yes, timers might provide hard timing, but that's not always a suitable or desirable method. An engineer has to have a toolbox full of tools, and choose the appropriate one. Yes, nails can be hammered in using a microscope, too.

And there's more. To write efficient code, one needs to know where the potential bottlenecks are, and what resources are to be used in what way. Look for example to the 'F2/'F4 execution from RAM ''benchmark'' - IMO it does contain surprises to the unaware.

More. It was a few days ago I wrote here about a gotcha involving timing between two internal components and some buses. To understand the nature of the gotcha, the information which now has to be gathered painfully from snippets spread through all the incoherent documentation had to be used.

And there's more and more.

I repeat - non of these is rocket science. It's just the matter of things done properly.

JW

waclawek.jan · ‎2013-08-05

Posted on August 05, 2013 at 16:21

> I found the Arm documentation where it states that you can't use NOPs for timing.

It's not that you *can't* use it for timing, it's just that under certain circumstances it may not produce the expected delay. However, this is just another poorly documented piece.

> 1. Are APB1 and APB2 considered to be the same slave?

According to RM0008, they both are accessed from the processor and DMA through a single row in the matrix, ie. yes.

> 2. If I do a write to any peripheral register (including GPIO), then I read the same peripheral (maybe a different register but same peripheral) back, is it guaranteed that the read will not execute until the write completes and that the next instruction (sans NOP) following the read instruction would not be executed until the read completes?

Roughly, yes. There may be some nuances involving subsequent LDs and STs, but they are IMO not relevant for this case.

> 3. How many cycles does it cost to read/write from/to the bus matrix and the peripherals?

From the processor's perspective, write is one cycle as long as the write buffer is empty, read is one cycle plus what the real read takes (but again there are nuances where a cycle may be spared from subsequent LD/ST). Now how long the write takes to arrive at the pin, it's depends on the AHB bus state (whether currently owned by processor or not and whether a transfer is already in progress on it), the AHB/APB clock ratio, the current state of the APB clock divider, and there may be additional delay imposed by the peripheral itself. Read is similar. None of these are documented, again, only snippets of information are available.

JW

bikejog · ‎2013-08-05

Posted on August 05, 2013 at 19:30

>It's not that you *can't* use it for timing, it's just that under certain circumstances it may not produce the expected delay. However, this is just another poorly documented piece.

That is the problem, under what circumstances does it not work? Is that ''circumstance'' always predictable?

> Roughly, yes. There may be some nuances involving subsequent LDs and STs, but they are IMO not relevant for this case.

I guess I should have asked my question another way. If I have 3 instructions:

1. Write peripheral register

2. Read peripheral register from same peripheral as #1

3. some other instruction

then when instruction 3 executes, the peripheral would have gotten the value wrote in instruction 1 (in the example of GPIO, the pin would have changed state) under all conditions?

> 3. How many cycles does it cost to read/write from/to the bus matrix and the peripherals?

I'm actually more interested in how many cycle consumed in the bus matrix. I'm trying to figure out the maximum number of cycles that the CPU can hold up the DMA.

Andy

waclawek.jan · ‎2013-08-07

Posted on August 07, 2013 at 09:22

>>It's not that you *can't* use [NOP] for timing, it's just that under certain circumstances it may not produce the expected delay. However, this is just another poorly documented piece.

> That is the problem, under what circumstances does it not work?

I could not find that out. In every my experiment so far, NOP appears to be time-consuming.

IMO, the ''NOP being purged from pipeline'' is an ARM option, which was not set on for the STM32s. I'd love to hear ST-One's comments on this.

> If I have 3 instructions:

> 1. Write peripheral register

> 2. Read peripheral register from same peripheral as #1

> 3. some other instruction

> then when instruction 3 executes, the peripheral would have gotten the value wrote in

> instruction 1 (in the example of GPIO, the pin would have changed state) under all conditions?

Peripheral transactions are not reordered and are all executed, so by the time the processor receives the value from read in step 2, the register is surely written. Now the GPIO might have imposed additional delay between the register writing and actual pin change, and the instruction in 3 might got executed during waiting for the result of instruction 2, if it does not depend on it. I don't believe any of these is happening in the STM32.

>> 3. How many cycles does it cost to read/write from/to the bus matrix and the peripherals?

>

> I'm actually more interested in how many cycle consumed in the bus matrix. I'm trying to

> figure out the maximum number of cycles that the CPU can hold up the DMA.

This is what IMO is not easy to answer without additional information from ST. The appnote ST-One referred to above should contain some of it, but IMO it is not clearly formulated and just adds to the confusion. Also, it depends on the usage scenario from the CPU - will the CPU perform multiple reads/writes to the peripherals in a row?

JW

dthedens23 · ‎2013-08-07

Posted on August 07, 2013 at 17:12

something to replace nop

mov r0,r0

dthedens23 · ‎2013-08-07

Posted on August 07, 2013 at 17:13

something to replace nop

mov r0,r0