cancel
Showing results for 
Search instead for 
Did you mean: 

DMA GPIO->SRAM test is not giving expected results

adrian5
Associate II
Posted on December 09, 2009 at 05:55

DMA GPIO->SRAM test is not giving expected results

14 REPLIES 14
adrian5
Associate II
Posted on May 17, 2011 at 13:33

I have built the GPIO_SRAM demo project described by AN2548 and configured PA.06 (capture clock signal) as an input for an external data source as described in the readme. To test it, I have a variable clock source (digital sig. gen) and an 8-bit counter chip on PortB. The counter increments on the rising clock edge and I have configured the timer pin to clock on the falling edge. All waveforms look good at the chip. I've remembered to remap a couple of the port B pins from their JTAG alternate functions as AN2548 used port D and my chip is a 64-pin STM32F103RBT6.

When I look in the buffer I see the count increment perfectly for all clock frequencies under around 400KHz, but above that counts start to go missing. I was expecting to get 18 (72/4) Mbytes per sec. not 0.4!

I am using my own PCB, with the serial flash loader. Project is built in Ride environment. In main, after the setup code supplied, I poll the half full flag and disable DMA when set. Then I print out the buffer contents to a terminal prog. using the UART... so I can see a snapshot of the buffer as it filled. What might be preventing me from getting the 18Mb/s I expected?

adrian5
Associate II
Posted on May 17, 2011 at 13:33

To show what's happening I've changed the DMA mode to circular and let the buffer fill. After a short while the DMA is disabled and some of the buffer is read out. Here's some cut'n'paste showing the sequence @ 400KHz (when it works OK)

Code:

DMA enabled. Buffer filling in circular mode while printf prints this message

DMA disabled now. Here are the first 128 values in the buffer:

59 5A 5B 5C 5D 5E 5F 60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F 70 71 72 73 74 75 76 77 78

79 7A 7B 7C 7D 7E 7F 80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F 90 91 92 93 94 95 96 97 98

99 9A 9B 9C 9D 9E 9F A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF B0 B1 B2 B3 B4 B5 B6 B7 B8

B9 BA BB BC BD BE BF C0 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF D0 D1 D2 D3 D4 D5 D6 D7 D8

At around 420 KHz it starts to go off the rails:

Code:

DMA enabled. Buffer filling in circular mode while printf prints this message

DMA disabled now. Here are the first 128 values in the buffer:

46 44 49 4A 4B 4C 4C 4D 4E 4F 50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F 60 61 62 64 65 66

67 67 68 69 6A 6B 6C 6D 6E 6F 70 71 72 73 74 75 76 77 78 79 7A 7B 7C 7D 7F 40 81 82 83 83 84 85

86 87 88 89 8A 8B 8C 8D 8E 8F 90 91 92 93 94 95 96 97 98 99 9B 9C 9D 9E 9F 9F A0 A1 A2 A3 A4 A5

A6 A7 A8 A9 AA AB AC AD AE AF B0 B1 B2 B3 B4 B5 B6 B7 B9 BA BB BC BC BD BE BF C0 C1 C2 C3 C4 C5

At 1MHz it's lost it completely:

Code:

DMA enabled. Buffer filling in circular mode while printf prints this message

DMA disabled now. Here are the first 128 values in the buffer:

9F A1 A2 A3 A5 A6 A7 A9 AA AB AC AE AF B0 B2 B3 B4 B6 B7 B8 BA BB BC BC BF C0 C1 C3 C4 C5 C7 C8

C9 CA CC CD CE D0 D1 D2 D4 D5 D6 D4 D9 DA DA DD DE DF E1 E2 E3 E5 E6 E7 E8 EA EB EC EE EF F0 F2

F3 F4 F6 F7 F8 FA FB FC FD FF 00 01 03 04 05 06 08 09 0A 0C 0D 0E 00 11 12 14 15 16 16 19 1A 1B

1D 1E 1F 21 22 23 24 26 27 28 2A 2B 2C 2E 2F 30 32 33 34 34 37 38 39 3B 3C 3D 3F 40 41 42 44 45

I wondered if it was going wobbly straight after reset so I put in a few seconds delay before the capture but it made no difference.

adrian5
Associate II
Posted on May 17, 2011 at 13:33

Well, I put all my assumptions to the test again today and found one very serious shortcoming - my external 8MHz crystal wasn't oscillating! The internal clock set up by the Bootloader was still going - 9x slower than I assumed.

However, with that fixed the max. transfer speed is still much lower than I expected. At DMA clock speeds above 3.4MHz the count gets scrambled in just the same way as shown above. What else could be getting in the way of the DMA transfer?

tomas23
Associate II
Posted on May 17, 2011 at 13:33

Hi Adrian, the DMA needs around 9-11 clock cycles to finish the transfer from APB to RAM, being ~7 Mtransfers/sec. You can use overlapped mode (two DMA channels running from same port to RAM, different buffers) and you achieve ~10.1 Mtransf./s @ 72 MHz.

The DMA needs 5 clocks for both AHB transfers, 2 cycles for beginning transaction, 1 cycle for acknowledge + 2-3 cycles for accessing APB bus. In overlapped mode, only the AHB+APB transfers are exclusive, giving you 7 cycles per transfer, thus 10 M/s.

armmcu
Associate II
Posted on May 17, 2011 at 13:33

Hi Adrian,

I think edison gave you the explanation

but you can find it also in the same document AN2548 you refer to:(DMA latency section)

adrian5
Associate II
Posted on May 17, 2011 at 13:33

Thanks edison - that does square with my findings (and the datasheet :-? ). This is bad news though as I was hoping to transfer 16-bit ITU−R BT.656 video @13.5 Mtransf./s (skipping lines to make time for processing that will vastly scale down the image). I'm really sad now as I was just getting to grips with the STM32 I guess I'll need to find a faster device. :-[

adrian5
Associate II
Posted on May 17, 2011 at 13:33

Quote:

On 06-12-2009 at 16:19, edison wrote:

You can use overlapped mode (two DMA channels running from same port to RAM, different buffers) and you achieve ~10.1 Mtransf./s @ 72 MHz.

Perhaps if I clock the data into external latches I could widen the data to 32bits and feed this into two 16-bit GPIO ports? This lowers the demand to 6.75 Mtransf./s - but I'm not sure if the device will support a couple of 16-bit transfers from IO to SRAM. Whether or not to spend/waste any more time investigating this is the big question for me now.

slawcus
Associate II
Posted on May 17, 2011 at 13:33

We don't know what other constraints you may have (price, time, peripherials, ...) but I'd use blackfin for this. Maybe it's overkill for you application but blackfin has PPI for ITUR656, video ALU, runs uClinux,... On the other side, it lacks internal flash.

armmcu
Associate II
Posted on May 17, 2011 at 13:33

Quote:

On 07-12-2009 at 10:37, adrian2 wrote:

but I'm not sure if the device will support a couple of 16-bit transfers from IO to SRAM. Whether or not to spend/waste any more time investigating this is the big question for me now.

Indeed the same DMA request (generated by external input event) can be used to trigg two transfers from I/O to SRAM, accordingly from two differents GPIOs and accross two diffrents DMA channels.

I don't think you will need overlap mode in this case.

Cheers.