Skip to main content
This topic has been closed for replies.

11 replies

waclawek.jan
Super User
November 30, 2021

Read AN2548.

The absolute minimum in case of AHB-to-AHB transfer is 5 cycles. APB adds cycles. Collisions with other masters add cycles. Slow source/target (e.g. FLASH) add cycles. 10 cycles sounds quite normal.

You may want to post details if you want to discuss this further.

JW

Tesla DeLorean
Guru
November 30, 2021

Well similar vintage here, but started with 2 MHz 6502, then Z80's, 68K, 808x and ARM (86)

Several people have had luck with the F4 doing VGA, DMA Mem-to-GPIO was perhaps capable of 21 MHz. The F429 having an LTDC to shovel data to a resistor dac

Flash on the F1 is slow (perhaps 35ns), and no hardware assist to mask it.

https://www.artekit.eu/vga-output-using-a-36-pin-stm32/

https://hackaday.io/project/173682-color-ascii-terminal

https://www.youtube.com/watch?v=5UFpp3ao460

https://www.youtube.com/watch?v=5u9ksKwvqe4

Tips, Buy me a coffee, or three.. PayPal Venmo (See Profile) Up vote any posts that you find helpful, it shows what's working..
Elektraglide
Associate II
December 1, 2021

This is memory-to-memory mode, SRAM to GPIOA so I was expecting closer to the 5 cycle minimum. For this test, I'm keeping the CPU idle (__NOP spin loop), so I wasn't expecting contention.

Danish1
Lead III
December 1, 2021

Looking at your blog, I see that you are reading SRAM by word and writing to GPIOA by byte:

DMA_InitStructure.DMA_PeripheralDataSize = DMA_PeripheralDataSize_Word;

DMA_InitStructure.DMA_MemoryDataSize = DMA_MemoryDataSize_Byte;

So the DMA unit is deconstructing the words to bytes, i.e. it is doing one read-from-RAM then four write-to-GPIO cycles.

If read-from-RAM took any time at all, one might see every fourth pixel being stretched. I haven't studied your images closely, but I don't see that.

STM32F103 is relatively old. I see GPIO is on APB2, which (to save power) may be clocked more slowly than AHB. What clock are you feeding to APB2?

(APB1 may be clocked to a maximum of 36 MHz but APB2 can go the full 72 MHz).

The division ratio is programmed in RCC->CFGR. What value do you put in there?

( I access registers directly because I find the best available documentation is the Reference Manual. So I don't know the equivalent initstructure stuff)

Hope this helps,

Danish

waclawek.jan
Super User
December 1, 2021

> SRAM to GPIOA

In 'F1, GPIO are on APB bus. Read AN2548 for what that means, timing-wise.

Also, try to make sure you have a genuine ST-made STM32F103, if you must stick to 'F1 (which I don't recommend either).

> DMA_InitStructure.DMA_PeripheralDataSize = DMA_PeripheralDataSize_Word;

> DMA_InitStructure.DMA_MemoryDataSize = DMA_MemoryDataSize_Byte;

> So the DMA unit is deconstructing the words to bytes, i.e. it is doing one read-from-RAM then four write-to-GPIO cycles.

No, it does not. (Assuming the Cube/HAL gibberish does what it seems to do, i.e. set in channel's control register MSIZE to 32-bit and PSIZE to 8-bit). The single-port DMA in 'F1 (and 'F0/'F3/'Lx/'Gx) does not support data packing/unpacking. It reads a word, throws away the three uppermost bytes, and writes the fourth. It means, that EVERY transfer consists of BOTH source (here SRAM) reading and destination (here GPIO) writing. See Programmable data width and endian behavior table in DMA chapter of RM0008 (that table is written with the rarely used PINC=1 setting, but the point is the same).

JW

Elektraglide
Associate II
December 2, 2021

Thank you! That is the doc I knew I'd seen somewhere but could never find again.

I am triggering DMA in software (ie not a DRQ from a peripheral). It is not obvious to me what clocks the DMA in this case - given I'm seeing 10 clocks / transfer and the nominal latency is 5 clocks, it would seem to indicate DMA is being clocked at 36MHz. But I cannot see anywhere to explicitly set this:

	// driving GPIOA with DMA1
	RCC_APB2PeriphClockCmd(RCC_APB2Periph_GPIOA, ENABLE);
	
 // 8-bits of output
	GPIO_InitTypeDef GPIO_InitDef;
	GPIO_StructInit(&GPIO_InitDef);
	GPIO_InitDef.GPIO_Pin = GPIO_Pin_0 | GPIO_Pin_1 | GPIO_Pin_2 | GPIO_Pin_3| GPIO_Pin_4 | GPIO_Pin_5 | GPIO_Pin_6 | GPIO_Pin_7;
	GPIO_InitDef.GPIO_Mode = GPIO_Mode_Out_PP;
	GPIO_InitDef.GPIO_Speed = GPIO_Speed_50MHz;
	GPIO_Init(GPIOA, &GPIO_InitDef);
	
	// MEM2MEM mode aka manual triggered by hsync. NB MEM2MEM expects DMA_DIR_PeripheralSRC
	RCC_AHBPeriphClockCmd(RCC_AHBPeriph_DMA1, ENABLE );
	DMA_InitTypeDef DMA_InitStructure;
	DMA_InitStructure.DMA_BufferSize = GFX_XRES;
	DMA_InitStructure.DMA_DIR = DMA_DIR_PeripheralSRC;
	DMA_InitStructure.DMA_M2M = DMA_M2M_Enable;
	DMA_InitStructure.DMA_PeripheralBaseAddr = (uint32_t)GfxBuffer;
	DMA_InitStructure.DMA_PeripheralDataSize = DMA_PeripheralDataSize_Word;
	DMA_InitStructure.DMA_PeripheralInc = DMA_PeripheralInc_Enable;
	DMA_InitStructure.DMA_Mode = DMA_Mode_Normal;
	DMA_InitStructure.DMA_MemoryBaseAddr = (uint32_t)&GPIOA->ODR;
	DMA_InitStructure.DMA_MemoryDataSize = DMA_MemoryDataSize_Byte;
	DMA_InitStructure.DMA_MemoryInc = DMA_MemoryInc_Disable;
	DMA_InitStructure.DMA_Priority = DMA_Priority_Medium;
	DMA_Init(DMA1_Channel5, &DMA_InitStructure);

Your description of the MSIZE / PSIZE matches what I understood. However, in this code snippet, I read DataSize_Word and write DataSize_Byte which is definitely faster than reading Byte and writing Byte (I can visually see my pixel image get wider). That is not what I expected..

Danish1
Lead III
December 3, 2021

Sorry I was wrong about the DMA doing data packing / unpacking. As Jan said, it does read the RAM word each time and writes just one byte to GPIO.

But you still haven't answered what clock rate APB2 is running at. It might be only 36 MHz even if AHB is running at 72 MHz.

It will be in the RCC section

Danish

Elektraglide
Associate II
December 3, 2021

I'm using the system_stm32f10x.c. I single step through the code and see:

 /* HCLK = SYSCLK */
 RCC->CFGR |= (uint32_t)RCC_CFGR_HPRE_DIV1;
 
 /* PCLK2 = HCLK */
 RCC->CFGR |= (uint32_t)RCC_CFGR_PPRE2_DIV1;
 
 /* PCLK1 = HCLK */
 RCC->CFGR |= (uint32_t)RCC_CFGR_PPRE1_DIV2;

Which I read as APB1 running at 36 and APB2 running at 72, right?

re: DMA data packing. Any idea why setting the DMA Source (SRAM) to be Word size is around 10-20% faster than Byte. Everything I've read - and everything I hear from this thread is that there is no FIFO to benefit from reading 4 bytes (ie its always tossed)..

Elektraglide
Associate II
December 4, 2021

Perhaps a simpler solution: Can anyone point me at some code example that sets up DMA1 running at 72MHz?

Elektraglide
Associate II
December 4, 2021

0693W00000HnXDYQA3.pngAll my PWM calculations are based on a 72MHz timer and the scope confirms I get precisely what I expect (2us pulse width), so I believe APB2 is running at full speed.

waclawek.jan
Super User
December 4, 2021

We have no idea what do you display on that scope.

For the DMA, read AN2548. As the write is traversing an AHB-to-APB bridge, the pattern is different than with AHB-to-AHB. Detailed timing of this are unfortunately not trivial. Also, AN2548 does not deal in detail with timing of M2M DMA, it may be different from M2P/P2M timings.

Corollary is, you have to take it as it is. ST might be willing to supply additional details if you represent significant buying power to them, as expressed in $M+.

With your displaying through DMA, try to transmit a recognizable pattern (e.g. 0-1-2-3-4) and observe using logic analyzer (maybe the 4 channel oscilloscope might suffice, but you may wan to experiment with the patterns to have a recognizable sync, e.g. a long 0, and then unambiguously recognizable sequence, i.e. one which could not be confused with the sequence you get if you transmit only each 4th byte, which you probably do).

JW

Elektraglide
Associate II
December 5, 2021

[Mea culpa! I've pasted the code that generates this 2uS pulse scope trace further down the thread]

And yes, I think M2M DMA performance is more nuanced than I read in the manuals.

Will try the DMA experiment. I think I need to configure the DMA to be DRQ-ed from the GPIO and not use M2M-mode to get 72MHz transactions.

[EDIT: Displaying black & white vertical stripes so I can measure pixel data easily, I measure 139ns / pixel - which confirms 5.00 cycles / byte @ 36MHz. hmm]

[ I've told my kids I want a more modern STM32 for Christmas! Though I have really enjoyed the boxed-in challenges of F1..]

Thanks again for all your help here.

Elektraglide
Associate II
December 5, 2021
	// setup PB0 driven by TIM3_CH3 PWM
	//
	GPIO_AFConfigure(AFIO_TIM3_NO_REMAP);	
	RCC_APB2PeriphClockCmd(RCC_APB2Periph_GPIOB, ENABLE);
	GPIO_InitTypeDef GPIO_InitDef;
	GPIO_StructInit(&GPIO_InitDef);
	GPIO_InitDef.GPIO_Pin = GPIO_Pin_0;
	GPIO_InitDef.GPIO_Mode = GPIO_Mode_AF_PP;		// driven by TIM3_CH3 PWM
	GPIO_InitDef.GPIO_Speed = GPIO_Speed_50MHz;
	GPIO_Init(GPIOB, &GPIO_InitDef);
	GPIO_SetBits(GPIOB, GPIO_Pin_0);
 
	// setup TIM3 with period 2048 ticks
	//
	RCC_APB1PeriphClockCmd(RCC_APB1Periph_TIM3, ENABLE);
	TIM_Cmd(TIM3, DISABLE);
	TIM_TimeBaseInitTypeDef timerInitStructure;
	timerInitStructure.TIM_Prescaler = 0;
	timerInitStructure.TIM_CounterMode = TIM_CounterMode_Up;
	timerInitStructure.TIM_Period = 2048;
	timerInitStructure.TIM_ClockDivision = TIM_CKD_DIV1;
	timerInitStructure.TIM_RepetitionCounter = 0;
	TIM_TimeBaseInit(TIM3, &timerInitStructure);
	TIM_ARRPreloadConfig(TIM3, ENABLE);
 
	// TIM3 CH3 PWM
	//
	TIM_OCInitTypeDef outputChannelInit;
	TIM_OCStructInit(&outputChannelInit);
	outputChannelInit.TIM_OCMode = TIM_OCMode_PWM2;
	outputChannelInit.TIM_OutputState = TIM_OutputState_Enable;
	outputChannelInit.TIM_Pulse = HSYNCPULSE;
	TIM_OC3Init(TIM3, &outputChannelInit);
	TIM_CtrlPWMOutputs(TIM3, ENABLE);

This code fragment sets up PB0 to be output of TIM3_CH3 and has TIM3 timer period of 2048 and CH3 as PWM mode.

Since TIM3 is on APB1 I would expect the period of the PB0 output to be 2048/36000000 = 56.8uS

What I measure is 28.4uS (see scope trace). I am clearly misunderstanding something.

Elektraglide
Associate II
December 5, 2021

[Apologies for answering my own question, but I now think I understand the STM32 behaviour]

Much of the docs refer to APB1 and APB2 and the different clock rates. Specifically, TIM1 & TIM8 being on APB2 (72MHz), other timers being on APB1 (36MHz).

I am sure I am not the only person to read this as being that, say, TIM3 is going to be clocked at the rate set by APB1.

This is not the case.

In the example code above, because the TIM3 CH3 PWM is driving the GPIOB (on APB2), the timing of TIM3 is also at 72MHz.

The same code using TIM1 driving GPIO with PWM, has the same 72MHz timing as TIM3. Now I know! Is that explained anywhere?