cancel
Showing results for 
Search instead for 
Did you mean: 

Parallel synchronous transmission using DMA affected by CPU overhead

RCata
Associate III

Hi friends,

I have DMA2 moving data between a look-up-table to GPIOB. DMA is fired by TIM8 that generates constant sampling time.

The system works fine, but when main loop have some instructions (dummy instructions that doesn't affect DMA system), DMA goes working wrong... losing samples.

Have this a solution ?

Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions

> What is the maximum DMA jitter I can expect?

That's a very hard question to answer, and it depends on various things (I assume you've already read AN4031).

The latency adds up (thus jitter of these sources also adds up in worst case) from 1. latency in DMA between trigger and start of transfer, 2. delays due to conflicts on source bus, 3. delays on destination bus.

Latency 1 is given by other active streams in the DMA - even if this particular stream has the highest priority, it has to wait until the currently active stream finishes its job. So worst case delay/jitter is given by the longest lasting other-stream DMA transfer (which again has constituents 2. and 3.).

Item 2 is the easiest - if you put the samples array into say RAM2 and no other busmaster accesses RAM2 (i.e. there are no variables accessed by the processor, the stack is not there, no other DMA goes there), latency is low (maybe one cycles) and there's no jitter. Even if there would be some conflict, SRAMs are fast and latencies and jitters are few (1-2-3) cycles.

Item 3 is probably the trickiest, as GPIO sit on AHB1, which contains almost all peripherals (most of them indirectly behind AHB/APB bridges, but still they are accessed by the masters through AHB1). If say the processor tries to read some peripheral at APB, this request has to cross the AHB/APB bridge, slowed down by the resynchronization to the possibly slower APB bus, wait until the addressed peripheral returns the answer and wait until that answer propagates through the AHB/APB bridge again. Depending on the AHB/APB divider, this may take even dozens of cycles. And, there are even worse cases, maybe the most prominent is the RTC, where in certain cases the wait lasts several RTC clocks - and RTC is clocked from LSE, i.e. at 32kHz... But that's probably the most extreme case and probably there are no other similar there.

JW

View solution in original post

15 REPLIES 15
S.Ma
Principal

Which STM32 are we talking about here?

Alex R
Senior

Also, what sampling rate are you using?

Could you use the DMA circular buffer?

DO you use DMA interrupts? Timer Interrupt?

Jack Peacock_2
Senior III

The DMA unit shares access to the bus matrix with the CPU. if both the CPU and DMA are attempting to access data from the same memory region (i.e. same bus) one of them has to wait.

Depending on what STM32 you are using you can avoid bus conflicts through the matrix if you place your DMA table in a section of SRAM that is not likely to be accessed by CPU for data or instructions. This is why there are multiple SRAM banks and TCM memory.

Is your stack in the same SRAM region as the DMA transfer? High probability of conflicts in that case. Even worse if executing code from same SRAM region.

Ideally, if you have multiple SRAM regions and are careful in memory layout then DMA and CPU can run in parallel with no bus conflicts. Of course, if this is continuous DMA then you can never access the SRAM with DMA data without risking a conflict....

Jack Peacock

RCata
Associate III

Hi, There is STM32F722ZE ( NUCLEO board)

Maximum sampling rate is about 500ns (extreme case).

DMA is in circular mode really.

TIM8-DMA2 system is fully hardware wired, doesn't use interrupts of any type. TIM8 TRGO fires the DMA without interrupt.

Thanks

Hi, I dosen't know that memory feature, and I don't making user memory allocation, leaving compiler to do it. I'm using STM32F722.

Actually, the execution code is in flash mem and data in data-memory region, but the cause, as you says, seems be the simultaneous access between CPU and DMA to same SRAM region.

I will try allocating Look-up table in other SRAM region than other registers.

Thanks jack!! i will be back with news.

How do you observe the problem, exactly?

> TIM8 TRGO fires the DMA

TRGO? It's a bit strange. Why is that choice? What is the source of that TRGO, i.e. what's in TIM8_CR2.MMS? Are there any channels brought out as signals to pins? Is TIM8 chained, as master or as slave to some other timer, or does it run completely freestanding?

JW

Okay waclawek,

TIM8 runs freely since its clock input is AHB1 clock. It's not slaved controlled.

I am using CC4 like source for firing DMA and CH4 like Strobe signal for external hardware. As result, strobe signal and data are not synchronized.

//This is DMA config:

DMA2_Stream7->PAR=(uint32_t)&GPIOB->ODR;   // Peri is GPIOB

DMA2_Stream7->CR=0x0E032D40;                     // Stream7 Channel7 = TIM8_CH4, No duoble buffer, maximum priority, no inc @peri, with inc @mem, circular, MEM->PERI, DMA is flow controller, w/o interrupts

//TIM8:

TIM8->DIER=0x00001000;         // DMA - Enable DMA_request from CC4 (CC4DE=1)

TIM8->SMCR=0x00000000;

TIM8->CR1=0x00000000;

// TIM8->CR2 is unconfigured, so, all bits are '0'

It sounds that your problem is not completely missing cycles, but that the data are output to GPIOB later than the strobe signal edge is output by timer.

You can try then trigger DMA from a different channel of TIM8, set so that it fires several cycles before CH4 outputting the strobe. You can then fine-tune that delay according to the requirements of your external attached hardware, and compensating the jitter caused by other busmasters accessing the memoruies and other buses (namely the AHB bus containing the GPIOs).

JW