cancel
Showing results for 
Search instead for 
Did you mean: 

I get STM32F103 DMA speed of 160 bytes in 1600 clock cycles => 1 transfer every 10 clock cycles. What else is contending the bus? Just seems a bit slow.

Elektraglide
Associate III

This from a project I've been building color display output blogged here:

https://medium.com/@adambillyard/experiments-in-hard-realtime-35136ed79398

15 REPLIES 15

Read AN2548.

The absolute minimum in case of AHB-to-AHB transfer is 5 cycles. APB adds cycles. Collisions with other masters add cycles. Slow source/target (e.g. FLASH) add cycles. 10 cycles sounds quite normal.

You may want to post details if you want to discuss this further.

JW

Well similar vintage here, but started with 2 MHz 6502, then Z80's, 68K, 808x and ARM (86)

Several people have had luck with the F4 doing VGA, DMA Mem-to-GPIO was perhaps capable of 21 MHz. The F429 having an LTDC to shovel data to a resistor dac

Flash on the F1 is slow (perhaps 35ns), and no hardware assist to mask it.

https://www.artekit.eu/vga-output-using-a-36-pin-stm32/

https://hackaday.io/project/173682-color-ascii-terminal

https://www.youtube.com/watch?v=5UFpp3ao460

https://www.youtube.com/watch?v=5u9ksKwvqe4

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..
Elektraglide
Associate III

This is memory-to-memory mode, SRAM to GPIOA so I was expecting closer to the 5 cycle minimum. For this test, I'm keeping the CPU idle (__NOP spin loop), so I wasn't expecting contention.

Danish1
Lead III

Looking at your blog, I see that you are reading SRAM by word and writing to GPIOA by byte:

DMA_InitStructure.DMA_PeripheralDataSize = DMA_PeripheralDataSize_Word;

DMA_InitStructure.DMA_MemoryDataSize = DMA_MemoryDataSize_Byte;

So the DMA unit is deconstructing the words to bytes, i.e. it is doing one read-from-RAM then four write-to-GPIO cycles.

If read-from-RAM took any time at all, one might see every fourth pixel being stretched. I haven't studied your images closely, but I don't see that.

STM32F103 is relatively old. I see GPIO is on APB2, which (to save power) may be clocked more slowly than AHB. What clock are you feeding to APB2?

(APB1 may be clocked to a maximum of 36 MHz but APB2 can go the full 72 MHz).

The division ratio is programmed in RCC->CFGR. What value do you put in there?

( I access registers directly because I find the best available documentation is the Reference Manual. So I don't know the equivalent initstructure stuff)

Hope this helps,

Danish

> SRAM to GPIOA

In 'F1, GPIO are on APB bus. Read AN2548 for what that means, timing-wise.

Also, try to make sure you have a genuine ST-made STM32F103, if you must stick to 'F1 (which I don't recommend either).

> DMA_InitStructure.DMA_PeripheralDataSize = DMA_PeripheralDataSize_Word;

> DMA_InitStructure.DMA_MemoryDataSize = DMA_MemoryDataSize_Byte;

> So the DMA unit is deconstructing the words to bytes, i.e. it is doing one read-from-RAM then four write-to-GPIO cycles.

No, it does not. (Assuming the Cube/HAL gibberish does what it seems to do, i.e. set in channel's control register MSIZE to 32-bit and PSIZE to 8-bit). The single-port DMA in 'F1 (and 'F0/'F3/'Lx/'Gx) does not support data packing/unpacking. It reads a word, throws away the three uppermost bytes, and writes the fourth. It means, that EVERY transfer consists of BOTH source (here SRAM) reading and destination (here GPIO) writing. See Programmable data width and endian behavior table in DMA chapter of RM0008 (that table is written with the rarely used PINC=1 setting, but the point is the same).

JW

Thank you! That is the doc I knew I'd seen somewhere but could never find again.

I am triggering DMA in software (ie not a DRQ from a peripheral). It is not obvious to me what clocks the DMA in this case - given I'm seeing 10 clocks / transfer and the nominal latency is 5 clocks, it would seem to indicate DMA is being clocked at 36MHz. But I cannot see anywhere to explicitly set this:

	// driving GPIOA with DMA1
	RCC_APB2PeriphClockCmd(RCC_APB2Periph_GPIOA, ENABLE);
	
        // 8-bits of output
	GPIO_InitTypeDef GPIO_InitDef;
	GPIO_StructInit(&GPIO_InitDef);
	GPIO_InitDef.GPIO_Pin = GPIO_Pin_0 | GPIO_Pin_1 | GPIO_Pin_2 | GPIO_Pin_3| GPIO_Pin_4 | GPIO_Pin_5 | GPIO_Pin_6 | GPIO_Pin_7;
	GPIO_InitDef.GPIO_Mode = GPIO_Mode_Out_PP;
	GPIO_InitDef.GPIO_Speed = GPIO_Speed_50MHz;
	GPIO_Init(GPIOA, &GPIO_InitDef);
	
	// MEM2MEM mode aka manual triggered by hsync.  NB MEM2MEM expects DMA_DIR_PeripheralSRC
	RCC_AHBPeriphClockCmd(RCC_AHBPeriph_DMA1, ENABLE );
	DMA_InitTypeDef DMA_InitStructure;
	DMA_InitStructure.DMA_BufferSize = GFX_XRES;
	DMA_InitStructure.DMA_DIR = DMA_DIR_PeripheralSRC;
	DMA_InitStructure.DMA_M2M = DMA_M2M_Enable;
	DMA_InitStructure.DMA_PeripheralBaseAddr = (uint32_t)GfxBuffer;
	DMA_InitStructure.DMA_PeripheralDataSize = DMA_PeripheralDataSize_Word;
	DMA_InitStructure.DMA_PeripheralInc = DMA_PeripheralInc_Enable;
	DMA_InitStructure.DMA_Mode = DMA_Mode_Normal;
	DMA_InitStructure.DMA_MemoryBaseAddr = (uint32_t)&GPIOA->ODR;
	DMA_InitStructure.DMA_MemoryDataSize = DMA_MemoryDataSize_Byte;
	DMA_InitStructure.DMA_MemoryInc = DMA_MemoryInc_Disable;
	DMA_InitStructure.DMA_Priority = DMA_Priority_Medium;
	DMA_Init(DMA1_Channel5, &DMA_InitStructure);

Your description of the MSIZE / PSIZE matches what I understood. However, in this code snippet, I read DataSize_Word and write DataSize_Byte which is definitely faster than reading Byte and writing Byte (I can visually see my pixel image get wider). That is not what I expected..

Danish1
Lead III

Sorry I was wrong about the DMA doing data packing / unpacking. As Jan said, it does read the RAM word each time and writes just one byte to GPIO.

But you still haven't answered what clock rate APB2 is running at. It might be only 36 MHz even if AHB is running at 72 MHz.

It will be in the RCC section

Danish

I'm using the system_stm32f10x.c. I single step through the code and see:

    /* HCLK = SYSCLK */
    RCC->CFGR |= (uint32_t)RCC_CFGR_HPRE_DIV1;
      
    /* PCLK2 = HCLK */
    RCC->CFGR |= (uint32_t)RCC_CFGR_PPRE2_DIV1;
    
    /* PCLK1 = HCLK */
    RCC->CFGR |= (uint32_t)RCC_CFGR_PPRE1_DIV2;

Which I read as APB1 running at 36 and APB2 running at 72, right?

re: DMA data packing. Any idea why setting the DMA Source (SRAM) to be Word size is around 10-20% faster than Byte. Everything I've read - and everything I hear from this thread is that there is no FIFO to benefit from reading 4 bytes (ie its always tossed)..

Elektraglide
Associate III

Perhaps a simpler solution: Can anyone point me at some code example that sets up DMA1 running at 72MHz?