2020-08-11 12:39 PM
Hi,
I would like to send parallel byte data from STM32H743 using DMA, e. g. using Port G0...G7, using a timer to define the cycling (e. g. TIM3.CH2).
For STM32L and STM32F4 this is described in AN4666 I think. I did this already on STM32F4, there it seemed to be quite straight forward.
I used the following code:
DMA1_Stream7->CR= (0 << DMA_SxCR_MBURST_Pos) |
(0 << DMA_SxCR_PBURST_Pos) |
(0 << DMA_SxCR_DBM_Pos) |
(2 << DMA_SxCR_PL_Pos) |
(2 << DMA_SxCR_MSIZE_Pos) |
(2 << DMA_SxCR_PSIZE_Pos) |
(1 << DMA_SxCR_MINC_Pos) |
(0 << DMA_SxCR_PINC_Pos) |
(0 << DMA_SxCR_CIRC_Pos) |
(1 << DMA_SxCR_DIR_Pos) |
(0 << DMA_SxCR_PFCTRL_Pos);
// 24= TIM3_CCR2
// 27= TIM3_UP
DMAMUX1_Channel7->CCR= (24 << DMAMUX_CxCR_DMAREQ_ID_Pos);
DMA1_Stream7->M0AR=(DWORD)&adwDmaDebug;
DMA1_Stream7->PAR= (DWORD)&GPIOG->ODR;
DMA1_Stream7->FCR= 0;
To start the DMA sequence with 32 words, I use the following:
DMA1_Stream7->NDTR= 32;
DMA1_Stream7->CR |=(1 << DMA_SxCR_EN_Pos);
TIM3 Ch2 is configured in a basic PWM mode to give out 1Hz (I tried also with 1MHz - later I want to use 1MHz, but I thought I start with a slow frequency).
If I do this, NDTR will decreas to 31, and then DMA1->HISR register shows always 0x00400000 (DMA-Fifo error for Stream7), and nothing more happening (NDTR NOT decreasing further... but Timer ch2 pin running nicely with 1Hz, and DMA-Fifo is not used in my code, as I use direct mode... so NO Fifo ... I tried also with other FCR_FCH settings, also with other MSIZE/PSIZE values, but NO success ... ).
Just one thing IS happening: The ODR will switch from my Init value 0xFF to zero. But this "zero" does NOT depend on my memory location value ... If I put 0x03030303 in the memeory, anyway I get zero in the GPIOG->ODR.
I put the adwDmaDebug to "32-Byte-aligned" address in the range 0x24000000 ... . This should be fine for DMA?
2020-08-11 01:53 PM
> DMA1_Stream7->M0AR=(DWORD)&adwDmaDebug;
I bet this is the problem. Show your definition of adwDmaDebug. Probably this points to the stack / DCTM which DMA can't access.
Listen to jan
2020-08-11 03:48 PM
I don't think so... OP said adwDmaDebug is ""32-Byte-aligned" address in the range 0x24000000 " i.e. AXI SRAM which is maybe not optimal for DMA1 but should be viable.
Also, inaccessible source/target would lead to Transfer Error.
FIFO error with Direct Mode indicates that the trigger was set *before* DMA was enabled (see Note in ch.4.3 in AN4031 rev.3, somewhat cumbersome wording there). DMA "obeys" that trigger by performing the Peripheral-side transfer without performing prior Memory-side transfer, that explains the "incorrect" value in GPIO_ODR. However, in direct mode this error gets ignored by DMA and it continues with transfers as normal, at least that's what happens in the 'F4 in such circumstances.
I'd have a closer look at readback of DMAMUX and TIM registers.
JW
2020-08-16 07:12 AM
Thank you Jan, this was really very helpful.
In my original code T3 really was just "running through" and DMA was started "on fly of TIM3". After I changed this (stopping and starting TIM3 correctly), at least the first burst ran without transfer error, and I saw some "very strange" number appearing in the GPIOG->ODR, so "something" was happening.
As there are some posts which say that access to GPIO_ODR is only possible wiht BDMA, I then changed to BDMA with LPTIM3 Output, as this was also quite suitable for my PCB ... . Then to my surprise the same "very strange" number appeared in GPIOG->ODR. ... then by some "nice accident" I found out that this "strange number" was the init setting of my RAM buffer ... Looking at this article "DMA is not working on STM32H7 devices", I then added the line "SCB_CleanDCache_by_Addr(...)", then then after some further changes it miraculously works now, and I also got it running nicley with DMA.
For somebondy who has experience with BDMA and DMA: Do you expect any severe drawbacks if I use the more "lazy" DMA approach? My block size will typically be 2048 Bytes, sent out with 1MHz clock - so one block frame takes about 2msec, and I will have to send such blocks every 5-10msec over some longer time... .
I do not want to moan too much again about all these "stupid HAL and/or CubeMX efforts and examples of STM" (in our city at carnival time the "first moaner" is burnt on some procession on every start of carnival, so I know I have to be careful...). But it is really quite nerving that STM does not present any "basic code examples" for such typical applications (without use of this stupid HAL and CubeMX stuff...). From time to time this would really save much development time and make life for beginners much more easier (I cannot really imagine how somebody should get real good access to STM32H7, if there is not basic background from STM32F4 - in STM32F4 Discovery times STM examples really very very nice and helpful...).
To help all my "basic code compagnons" here in the forum, here the code snippets you will need for the BDMA and DMA solution:
BDMA-Code solution:
// DMA-Memory: Global Array in SRAM4, 32byte-aligned
BYTE abDmaDebug[2048] ALIGN32 RAM_D3_SRAM4 = {1, 2, 3, 4, 5, 6, 7, 8};
void Init_BDMA(){
RCC->APB4ENR |= RCC_APB4ENR_LPTIM3EN;
RCC->AHB4ENR |= RCC_AHB4ENR_BDMAEN;
for( int i= 0; i< elements( abDmaDebug); i++)
abDmaDebug[i]= (i+3);
SCB_CleanDCache_by_Addr( (uint32_t*)abDmaDebug, sizeof( abDmaDebug));
#define PIN_DCLK "PA1 TL3"
LPTIM3->CR= LPTIM_CR_ENABLE;
#define MHZ_LT1_LT2_LT3_LT4_LT5 100 //MHZ_APB3
#define FREQU_LT3 1000000 //1MHz
#define LT3_ARR 100 // MHZ_LT1_LT2_LT3_LT4_LT5*1000000 / FREQU_LT3
LPTIM3->ARR= LT3_ARR-1;
LPTIM3->CMP= LT3_ARR/2;
#define DMA_DCLK "BDMA_Channel0"
//1=Requestgenerator 0
DMAMUX2_Channel0->CCR= (1 << DMAMUX_CxCR_DMAREQ_ID_Pos);
//Request-Generator:
//12=LPTIM3_OUT (Trigger-Input)
DMAMUX2_RequestGenerator0->RGCR= (12 << DMAMUX_RGxCR_SIG_ID_Pos) |
(1 << DMAMUX_RGxCR_GPOL_Pos);
DMAMUX2_Channel0->CCR|= DMAMUX_CxCR_EGE;
DMAMUX2_RequestGenerator0->RGCR|= DMAMUX_RGxCR_GE;
BDMA_Channel0->CCR= (2 << BDMA_CCR_PL_Pos) |
// SIZE for GPIOG->ODR: 1 or 2 (NOT 0)
(0 << BDMA_CCR_MSIZE_Pos) |
(1 << BDMA_CCR_PSIZE_Pos) |
(1 << BDMA_CCR_MINC_Pos) |
(0 << BDMA_CCR_PINC_Pos) |
(0 << BDMA_CCR_CIRC_Pos) |
(1 << BDMA_CCR_DIR_Pos);
//ATTENTION! Memory in SRAM4!!
BDMA_Channel0->CM0AR=(DWORD)&abDmaDebug;
BDMA_Channel0->CPAR= (DWORD)&GPIOG->ODR;
}
void Fire_BDMA(){
LPTIM3->CR= 0;
BDMA_Channel0->CCR &=~(1 << BDMA_CCR_EN_Pos);
BDMA->IFCR= (DWORD)-1;
BDMA_Channel0->CNDTR= elements( awDmaDebug);
BDMA_Channel0->CCR |=(1 << BDMA_CCR_EN_Pos);
LPTIM3->CR= LPTIM_CR_ENABLE;
LPTIM3->CR= LPTIM_CR_ENABLE | LPTIM_CR_CNTSTRT;
}
DMA code solution:
// DMA-Memory: Global Array in SRAM1, 32byte-aligned
BYTE abDmaDebug[2048] ALIGN32 RAM_D2_SRAM1 = {1, 2, 3, 4, 5, 6, 7, 8};
void Init_DMA(){
#define PIN_DCLK "PA7 T3.2"
#define DMA_DCLK "DMA1_Stream0"
RCC->APB1LENR |= RCC_APB1LENR_TIM3EN;
RCC->AHB1ENR |= RCC_AHB1ENR_DMA1EN;
for( int i= 0; i< elements( abDmaDebug); i++)
abDmaDebug[i]= (i+3);
SCB_CleanDCache_by_Addr( (uint32_t*)abDmaDebug, sizeof( abDmaDebug));
// sorry, I was too lazy here to give the "basic code" for this class
// TimData8... this just produces a basic PWM with 1MHz on pin T3.2
#define PIN_DCLK "PA7 T3.2"
TimData8.Init_RCCok( TIM3, TIMTYP_TIM3);
TimData8.SetFrequ_Hz( 1000000);
TimData8.SetPower( 0.5);
TIM3->DIER |= TIM_DIER_CC2DE;
#define DMA_DCLK "DMA1_Stream0"
DMA1_Stream7->CR= (0 << DMA_SxCR_MBURST_Pos) |
(0 << DMA_SxCR_PBURST_Pos) |
(0 << DMA_SxCR_DBM_Pos) |
(2 << DMA_SxCR_PL_Pos) |
(0 << DMA_SxCR_MSIZE_Pos) |
(0 << DMA_SxCR_PSIZE_Pos) |
(1 << DMA_SxCR_MINC_Pos) |
(0 << DMA_SxCR_PINC_Pos) |
(0 << DMA_SxCR_CIRC_Pos) |
(1 << DMA_SxCR_DIR_Pos) |
(0 << DMA_SxCR_PFCTRL_Pos);
// 24= TIM3_CCR2
// 27= TIM3_UP
DMAMUX1_Channel7->CCR= (24 << DMAMUX_CxCR_DMAREQ_ID_Pos);
DMAMUX1_Channel7->CCR|= DMAMUX_CxCR_EGE;
// ATTENTION! Memory in D2_SRAM1-3
DMA1_Stream7->M0AR=(DWORD)&abDmaDebug;
DMA1_Stream7->PAR= (DWORD)&GPIOG->ODR;
DMA1_Stream7->FCR= 0;
}
void Fire_DMA(){
TimData8.Stop( 0);
TIM3->DIER &= ~TIM_DIER_CC2DE;
DMAMUX1_Channel7->CCR&= ~DMAMUX_CxCR_EGE;
DMA1_Stream7->NDTR= elements( abDmaDebug);
DMA1->HIFCR= (DWORD)-1;
DMA1_Stream7->CR |=(1 << DMA_SxCR_EN_Pos);
DMAMUX1_Channel7->CCR|= DMAMUX_CxCR_EGE;
TIM3->DIER |= TIM_DIER_CC2DE;
TimData8.Start();
}
2020-08-16 07:53 AM
> at carnival time the "first moaner" is burnt
Oh, I'd be at the stake very soon...
> But it is really quite nerving that STM does not present any "basic code examples" for such typical applications (without use of this stupid HAL and CubeMX stuff...).
You may want to support this request by placing your vote. I believe it won't count as moaning =)
JW
2020-08-16 10:31 PM
Thank you for the link, I just left a support message.
Concerning my question of comparison DMA / BDMA for this application: Do you have any comments here / experiences?
Or is 1MHz output frequency such slow, that it does not really matter wheter I would use DMA or BDMA?
2020-08-16 10:47 PM
To me it sounds perfectly OK to keep it as far as possible from the "computing core", so BDMA is IMO Ok, although 1MHz DMA won't make much difference in a 400+MHz system. I don't use the H7, though.
JW