Skip to main content
adrian5
Associate
December 9, 2009
Question

DMA GPIO->SRAM test is not giving expected results

  • December 9, 2009
  • 14 replies
  • 2526 views
Posted on December 09, 2009 at 05:55

DMA GPIO->SRAM test is not giving expected results

    This topic has been closed for replies.

    14 replies

    adrian5
    adrian5Author
    Associate
    May 17, 2011
    Posted on May 17, 2011 at 13:33

    To show what's happening I've changed the DMA mode to circular and let the buffer fill. After a short while the DMA is disabled and some of the buffer is read out. Here's some cut'n'paste showing the sequence @ 400KHz (when it works OK)

    Code:

    DMA enabled. Buffer filling in circular mode while printf prints this message

    DMA disabled now. Here are the first 128 values in the buffer:

    59 5A 5B 5C 5D 5E 5F 60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F 70 71 72 73 74 75 76 77 78

    79 7A 7B 7C 7D 7E 7F 80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F 90 91 92 93 94 95 96 97 98

    99 9A 9B 9C 9D 9E 9F A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF B0 B1 B2 B3 B4 B5 B6 B7 B8

    B9 BA BB BC BD BE BF C0 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF D0 D1 D2 D3 D4 D5 D6 D7 D8

    At around 420 KHz it starts to go off the rails:

    Code:

    DMA enabled. Buffer filling in circular mode while printf prints this message

    DMA disabled now. Here are the first 128 values in the buffer:

    46 44 49 4A 4B 4C 4C 4D 4E 4F 50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F 60 61 62 64 65 66

    67 67 68 69 6A 6B 6C 6D 6E 6F 70 71 72 73 74 75 76 77 78 79 7A 7B 7C 7D 7F 40 81 82 83 83 84 85

    86 87 88 89 8A 8B 8C 8D 8E 8F 90 91 92 93 94 95 96 97 98 99 9B 9C 9D 9E 9F 9F A0 A1 A2 A3 A4 A5

    A6 A7 A8 A9 AA AB AC AD AE AF B0 B1 B2 B3 B4 B5 B6 B7 B9 BA BB BC BC BD BE BF C0 C1 C2 C3 C4 C5

    At 1MHz it's lost it completely:

    Code:

    DMA enabled. Buffer filling in circular mode while printf prints this message

    DMA disabled now. Here are the first 128 values in the buffer:

    9F A1 A2 A3 A5 A6 A7 A9 AA AB AC AE AF B0 B2 B3 B4 B6 B7 B8 BA BB BC BC BF C0 C1 C3 C4 C5 C7 C8

    C9 CA CC CD CE D0 D1 D2 D4 D5 D6 D4 D9 DA DA DD DE DF E1 E2 E3 E5 E6 E7 E8 EA EB EC EE EF F0 F2

    F3 F4 F6 F7 F8 FA FB FC FD FF 00 01 03 04 05 06 08 09 0A 0C 0D 0E 00 11 12 14 15 16 16 19 1A 1B

    1D 1E 1F 21 22 23 24 26 27 28 2A 2B 2C 2E 2F 30 32 33 34 34 37 38 39 3B 3C 3D 3F 40 41 42 44 45

    I wondered if it was going wobbly straight after reset so I put in a few seconds delay before the capture but it made no difference.

    adrian5
    adrian5Author
    Associate
    May 17, 2011
    Posted on May 17, 2011 at 13:33

    I have built the GPIO_SRAM demo project described by AN2548 and configured PA.06 (capture clock signal) as an input for an external data source as described in the readme. To test it, I have a variable clock source (digital sig. gen) and an 8-bit counter chip on PortB. The counter increments on the rising clock edge and I have configured the timer pin to clock on the falling edge. All waveforms look good at the chip. I've remembered to remap a couple of the port B pins from their JTAG alternate functions as AN2548 used port D and my chip is a 64-pin STM32F103RBT6.

    When I look in the buffer I see the count increment perfectly for all clock frequencies under around 400KHz, but above that counts start to go missing. I was expecting to get 18 (72/4) Mbytes per sec. not 0.4!

    I am using my own PCB, with the serial flash loader. Project is built in Ride environment. In main, after the setup code supplied, I poll the half full flag and disable DMA when set. Then I print out the buffer contents to a terminal prog. using the UART... so I can see a snapshot of the buffer as it filled. What might be preventing me from getting the 18Mb/s I expected?

    tomas23
    Visitor II
    May 17, 2011
    Posted on May 17, 2011 at 13:33

    Hi Adrian, the DMA needs around 9-11 clock cycles to finish the transfer from APB to RAM, being ~7 Mtransfers/sec. You can use overlapped mode (two DMA channels running from same port to RAM, different buffers) and you achieve ~10.1 Mtransf./s @ 72 MHz.

    The DMA needs 5 clocks for both AHB transfers, 2 cycles for beginning transaction, 1 cycle for acknowledge + 2-3 cycles for accessing APB bus. In overlapped mode, only the AHB+APB transfers are exclusive, giving you 7 cycles per transfer, thus 10 M/s.

    adrian5
    adrian5Author
    Associate
    May 17, 2011
    Posted on May 17, 2011 at 13:33

    Well, I put all my assumptions to the test again today and found one very serious shortcoming - my external 8MHz crystal wasn't oscillating! The internal clock set up by the Bootloader was still going - 9x slower than I assumed.

    However, with that fixed the max. transfer speed is still much lower than I expected. At DMA clock speeds above 3.4MHz the count gets scrambled in just the same way as shown above. What else could be getting in the way of the DMA transfer?

    adrian5
    adrian5Author
    Associate
    May 17, 2011
    Posted on May 17, 2011 at 13:33

    Thanks edison - that does square with my findings (and the datasheet :-? ). This is bad news though as I was hoping to transfer 16-bit ITU−R BT.656 video @13.5 Mtransf./s (skipping lines to make time for processing that will vastly scale down the image). I'm really sad now as I was just getting to grips with the STM32 I guess I'll need to find a faster device. :-[

    armmcu
    Associate II
    May 17, 2011
    Posted on May 17, 2011 at 13:33

    Hi Adrian,

    I think edison gave you the explanation

    but you can find it also in the same document AN2548 you refer to:(DMA latency section)

    slawcus
    Associate II
    May 17, 2011
    Posted on May 17, 2011 at 13:33

    We don't know what other constraints you may have (price, time, peripherials, ...) but I'd use blackfin for this. Maybe it's overkill for you application but blackfin has PPI for ITUR656, video ALU, runs uClinux,... On the other side, it lacks internal flash.

    adrian5
    adrian5Author
    Associate
    May 17, 2011
    Posted on May 17, 2011 at 13:33

    Quote:

    On 06-12-2009 at 16:19, edison wrote:

    You can use overlapped mode (two DMA channels running from same port to RAM, different buffers) and you achieve ~10.1 Mtransf./s @ 72 MHz.

    Perhaps if I clock the data into external latches I could widen the data to 32bits and feed this into two 16-bit GPIO ports? This lowers the demand to 6.75 Mtransf./s - but I'm not sure if the device will support a couple of 16-bit transfers from IO to SRAM. Whether or not to spend/waste any more time investigating this is the big question for me now.

    adrian5
    adrian5Author
    Associate
    May 17, 2011
    Posted on May 17, 2011 at 13:33

    slawcus, I think blackfin is really too big a hammer to crack this nut. The objective is to scale down a video feed into a vague impression of the colours top, bottom and sides of the frame to create background illumination behind a TV screen. It's only really a home experiment.

    Quote:

    Indeed the same DMA request (generated by external input event) can be used to trigg two transfers from I/O to SRAM, accordingly from two differents GPIOs and accross two diffrents DMA channels.

    I don't think you will need overlap mode in this case.

    armmcu.engineer, I didn't realise that the same request could initiate two transfers! Now I realise that I haven't understood how DMA is linked to a timer - AN2548 configures DMA Channel 6 to transfer from GPIO to RAM and then enables Timer 3 DMA with:

    /* Enable TIM3 DMA */

    TIM_DMACmd(TIM3, TIM_DMA_CC1, ENABLE)

    How does this specify DMA channel6? Put another way, if another DMA channel is configured for a different source/dest what command would enable it to be clocked from the same timer pin?

    armmcu
    Associate II
    May 17, 2011
    Posted on May 17, 2011 at 13:33

    Quote:

    On 07-12-2009 at 10:37, adrian2 wrote:

    but I'm not sure if the device will support a couple of 16-bit transfers from IO to SRAM. Whether or not to spend/waste any more time investigating this is the big question for me now.

    Indeed the same DMA request (generated by external input event) can be used to trigg two transfers from I/O to SRAM, accordingly from two differents GPIOs and accross two diffrents DMA channels.

    I don't think you will need overlap mode in this case.

    Cheers.