cancel
Showing results for 
Search instead for 
Did you mean: 

Timer DMA requests through DMA1 clarification

con3
Senior
Posted on January 24, 2018 at 10:32

Hey everyone,

I need some clarification on this, as it will help my program run smoother.

I want to run 2 DMA's simultaneously. One which will send data out using the SDMMC1, which is linked to DMA2. Then I'd like to read in data using DMA1 connected to Either of the Timer peripherals that allow a DMA request from GPIO to memory. Which here would be for example Timer 2 of DMA 1 stream 1.

0690X00000604I3QAI.jpg

It doesn't seem like I can read in data from the GPIO to memory using DMA1 as seen in the image below from application note an 4 I've read on some forums that this could be because there is no direct connection between the DMA1 P and the bus matrix. So It would require me to go through a direct path. Is this possible as I haven't been able to get this going?

0690X00000604IIQAY.jpg

I'd like to have the two DMA's running independently to avoid a clash.

Thanks in advance for any help.

dma-gpio stm32f7-dma stm32f7

17 REPLIES 17
Posted on February 06, 2018 at 15:42

Hi

Waclawek.Jan

and

Turvey.Clive.002

,

I'm still struggling to see how my processes will interfere with the DMA. I understand its a non-trivial task, although even after reading through the an4031 and the manuals, it seems like there are too many factors to know whether clashes would be detrimental to my data integrity.

I'm hoping you could possibly help me or point me in the right direction (It's entirely possible I missed something).

My process is as follows, I'm using two timers, and three dma streams. My stm32f722ze will run as a slave for the ADC and will run as the master for the SDMMC card.

Due to this and as recommended by an4031, I've asserted that the DMA stream servicing the ADC has the highest priority (both software and hardware wise)

Configuration:

For the ADC: I've selected:

DMA2 Stream 1 at a very high priority. This is a peripheral to memory request

For the SDMMC:

SDMMC TX has DMA2 stream 3, priority is set as high.

SDMMC RX has DMA2 stream 6, priority is set as low.

Program flow:

I’ll have one timer channel that receives a clock input at 20 MHz with an output compare set to trigger every time 8 clocks have been received on the input. This will be timer 8 which has an internal clock running at 216 MHz.

Every time a toggle is outputted it will be fed into an input capture of Timer 1.

Timer1 will then trigger a DMA request on every rising and falling edge to move the data from GPIOF’s input data register to a FIFO set with a data width of Half word and a burst size of 4 increment. Its threshold will be set to half full.

So essentially a 2.5 MHz signal will trigger DMA2 Stream 1 with a very high priority to store data to the fifo and when the threshold is reached, the CPU will move this to a buffer that I’ve defined in SRAM.

Every time 192 halfwords have been received, the SDMMC will need to write this to an open file in the SDCARD. The micro SDcard is a SDHC UHS-I class 10 and can reansfer up to 60 MB/s. My SDMMC clock is set to run at 48 MHz and when measuring with an oscilloscope I can see the clock line running at 24 MHz. This will do transfers in 4 bit SDIO mode. The DMA is also using the FIFO and is has its data width set to word with a 4 increment burst size when the threshold is full.

My concern is that I never want to miss data from the ADC coming in input capture mode on Timer 1, hence whenever DMA2 is triggered I want to ensure that It collects the Data prior to the next trigger. At 2.5 Mhz, that would provide a gap of 400 nanoseconds between incoming data. Therefore I need the SDMMC to write out the data as it comes in to ensure a bottleneck isn’t reached in memory, although I also need to ensure that at every trigger on timer 1, data is read in so samples aren’t missed.

I understand that two requests occur, One on the peripheral side and one on the memory and that the FIFO plays a role between requests.I also understand the arbitration process. I'm also assuming that on the peripheral side the DMA streams can work simultaneously as there accessing different locations , namely AHB1 for the GPIOF and APB2 for the SDMMC1 although memory locations can't be accessed simultaneously.

This seems non-trivial from the an4031 application note, but very difficult when actually trying to determine.

How would I go about determining whether data integrity would actually be affected?

Thank you for all the help thus far. I’ve really appreciated the help and support during this steep learning curve. If I can figure this out, I can move on to the PCB assembly.

Posted on February 06, 2018 at 18:25

The DMA interactions will be both complex and dynamic, going to be quite hard to model.

You have one sample time on the ADC to pull the data via DMA to avoid an overrun, the ADC determines the sample point, and sample time. Would expect some error to be flagged in you run out of bandwidth. Reading GPIO pins likely to have significantly more variability/jitter due to the inability to control the sample point internally.

The SDMMC can run faster than 48 MHz, on the L4+ I can move it off the USB48 source.

The size of the data blocks written to the SD Card will be critical. Non-aligned blocks will require additional IO at the FatFs level, and also spanning blocks (larger erase ones) on the media will also slow writes significantly.

You really need a large 'spill' buffer where you can accumulate >32KB (ie 32KB + your buffering unit) and then write whole 32KB blocks, shifting down the excess. You might want to tune the size higher/lower to find the true sweet-spot, but 32KB is a good initial shot. I would do this in the foreground, or a singular thread (SDIO+FATFS), that flushes data to the media. Other buffer management being done via DMA interrupts, etc.

Benchmark card performance so you know what it is, what's achievable may differ from what's written on the label. From a user technical support perspective having some benchmark test in the product will save many hours of wasted time dealing with dodgy cards an end user bought from the least-cost provider. If you need a card to deliver 5 MBps write speeds, having a quick way to flag this will save a lot of time arguing.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..
Posted on February 06, 2018 at 18:41

The DMA interactions will be both complex and dynamic, going to be quite hard to model.

+1

Unless you are in the $$$$$$$-buyer cathegory where ST is gladly willing to fire up an expensive simulation for you, I'm afraid you have to resort to rules of the thumb as we other mortals do.

Why don't you conduct an experiment on an appropriate F7 DISCO with SD card holder

http://www.st.com/content/st_com/en/products/evaluation-tools/product-evaluation-tools/mcu-eval-tools/stm32-mcu-eval-tools/stm32-mcu-discovery-kits/32f746gdiscovery.html

  (or wire up one onto a Nucleo - I am not sure about whether this is a good idea)? You'd get a feeling for the things at least. You can simulate the ADC's working using one or a couple of timers, simulating the ADC's output by one output of a timer; by changing the delay between sample clock and the simulated ADC output you can observe whether you have enough headroom in the timing of the whole machine.

JW

Posted on February 06, 2018 at 19:19

Unless you are in the $$$$$$$-buyer cathegory where ST is gladly willing to fire up an expensive simulation for you, I'm afraid you have to resort to rules of the thumb as we other mortals do.

I'm only a student so rules of thumb will have to work!

Why don't you conduct an experiment on an appropriate F7 DISCO with SD card holder

/external-link.jspa?url=http%3A%2F%2Fwww.st.com%2Fcontent%2Fst_com%2Fen%2Fproducts%2Fevaluation-tools%2Fproduct-evaluation-tools%2Fmcu-eval-tools%2Fstm32-mcu-eval-tools%2Fstm32-mcu-discovery-kits%2F32f746gdiscovery.html

(or wire up one onto a Nucleo - I am not sure about whether this is a good idea)? You'd get a feeling for the things at least. You can simulate the ADC's working using one or a couple of timers, simulating the ADC's output by one output of a timer; by changing the delay between sample clock and the simulated ADC output you can observe whether you have enough headroom in the timing of the whole machine.

This seems like a good Idea, I'll test it out tomorrow.

One alternative would be to run the sdcard without DMA and have the ADC use the DMA with double buffer mode. That should allow more leeway for the system.

Thank you for all the help!

Posted on February 06, 2018 at 19:40

Thank you for the reply Clive,

I really appreciate it but I have a few questions.

You have one sample time on the ADC to pull the data via DMA to avoid an overrun, the ADC determines the sample point, and sample time. Would expect some error to be flagged in you run out of bandwidth. Reading GPIO pins likely to have significantly more variability/jitter due to the inability to control the sample point internally.

I don't know if any error would be flagged if the DMA misses the sample? If the sdmmc unit takes too long to copy over, would that not just result in corrupted data without any error being flagged? I have error callbacks for DMA failures but these seem to only trigger if the DMA fails, not when it misses a sample?

The SDMMC can run faster than 48 MHz, on the L4+ I can move it off the USB48 source.

The size of the data blocks written to the SD Card will be critical. Non-aligned blocks will require additional IO at the FatFs level, and also spanning blocks (larger erase ones) on the media will also slow writes significantly.

You really need a large ''spill'' buffer where you can accumulate >32KB (ie 32KB + your buffering unit) and then write whole 32KB blocks, shifting down the excess. You might want to tune the size higher/lower to find the true sweet-spot, but 32KB is a good initial shot. I would do this in the foreground, or a singular thread (SDIO+FATFS), that flushes data to the media. Other buffer management being done via DMA interrupts, etc.

This portion has got me slightly confused. I think this is what is meant here:

0690X00000604LbQAI.jpg

As in the drawing, I should use two buffers, when Buffer 1 of size 32KB is full it needs to be sent to the sdcard and Buffer 2 gets filled via the dma with every input capture? Wouldn't this still be an issue as the DMA will be busy sending out data to the SDMMC and miss triggers to load data into buffer 2 from the input capture. I might completely be misunderstanding what you meant. I'm not sure if your hinting at RTOS with the singular thread. But if there's anyway that I can get Buffer 2 to fill while buffer 1 is being emptied, I'd be extremely happy

EDIT: Just an edit, Even if I use DMA2 just to read in data from the input capture and don't use the DMA2 for the sdcard, but rather just the CPU. Then I'll still have collisions between the CPU and DMA2 accessing RAM? I can write out large pieces of data to the sdcard(32KB), although I'll still miss samples when the CPU transfers data from RAM to the sdcard?

Benchmark card performance so you know what it is, what's achievable may differ from what's written on the label. From a user technical support perspective having some benchmark test in the product will save many hours of wasted time dealing with dodgy cards an end user bought from the least-cost provider. If you need a card to deliver 5 MBps write speeds, having a quick way to flag thiswill save a lot of time arguing.

Is there any way that I can bench mark my cards performance (as a student with limited resources)? I've seen this in a few threads with a detailed benchmark of the sdcard read and write performance.

Thank you in advance for any help!

Posted on February 06, 2018 at 23:20

It's not clear you'll miss samples. The DMA+CPU share the resources, and take turns, contention is at a unit transfer level. Both ADC and MMC would throw under/over run errors if the peripheral isn't serviced in a timely manner.

The idea of the spill buffer is to try to reduce the number of unnecessary copies. Ideally you'd have a pair of equally size ping-pong buffer, which alternate so you can write the inactive half.

With 192 samples (384 bytes), that doesn't fit cleanly into 32768 (you want binary multiples, matching the blocking device)

A spill buffer would describe 32768 + 384 bytes, every time you have 384 bytes ready you copy into the buffer. Logic detects >= 32768 available and flushes that to the media. You then look at the excess, pulling it to the front of the buffer.

while(queueddata)

{

fill_level += dequeue(buffer + fill_level); // 384 bytes

if (fill_level >= 32768)

  flush_to_disk(buffer, 32768); // ie f_write(&fil, buffer, 32768, &byteswritten)

fill_level = fill_level - 32768;

if (fill_level) // Pull down

  memcpy(buffer, buffer + 32768, fill_level);

}

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..
Posted on February 07, 2018 at 08:03

Hi Clive,

Thank you for the help. I'll do this and after looking at the ref manual I can see that I'll get a FIFO or direct mode error(if FIFO isn't enabled) if I miss a sample(i.e the DMA is triggered multiple times without access to the bus matrix).

I just maybe have a solution that I've though of if I test and its not fast enough(which I'll be doing today or tomorrow).

Could I use the Double buffer mode with one buffer in SRAM1 and one buffer in SRAM2 and by doing this have concurrent read and writes?

0690X00000604LlQAI.jpg

From this image from the datasheet it looks like the CPU would be able to reading from SRAM1 while the DMA is writing to SRAM2. Without any collisions occurring.

Posted on February 12, 2018 at 19:50

Hi

Turvey.Clive.002

and

Waclawek.Jan

,

Just wanted to follow up and say I tested it like this:

Why don't you conduct an experiment on an appropriate F7 DISCO with SD card holder

/external-link.jspa?url=http%3A%2F%2Fwww.st.com%2Fcontent%2Fst_com%2Fen%2Fproducts%2Fevaluation-tools%2Fproduct-evaluation-tools%2Fmcu-eval-tools%2Fstm32-mcu-eval-tools%2Fstm32-mcu-discovery-kits%2F32f746gdiscovery.html

(or wire up one onto a Nucleo - I am not sure about whether this is a good idea)? You'd get a feeling for the things at least. You can simulate the ADC's working using one or a couple of timers, simulating the ADC's output by one output of a timer; by changing the delay between sample clock and the simulated ADC output you can observe whether you have enough headroom in the timing of the whole machine.

JW

I ran a trigger at 2.5 MHz to an input capture channel with a DMA double buffer. Every time one buffer was full I'd call f_write in the callback to send it out while the other buffer is being filled. It worked perfectly, I had no transfer error callbacks do to FIFO overrun and saw the expected data in the output.

Thank you for all the help!