cancel
Showing results for 
Search instead for 
Did you mean: 

CPU to memory/peripheral and DMA to memory/peripheral bandwidth

Singh.Harjit
Senior II

ST team, can you share memory performance (bandwidth) between:

1) The M4 core and SRAM memory on an unloaded system?

2) A DMA channel doing memory to memory move?

3) A DMA channel writing to a peripheral register from memory?

Or put another way, how many MB/s can I read from, write to SRAM using the CPU and using the DMA engine in a completely unloaded system?

Thank you.

5 REPLIES 5
Danish1
Lead III

Do you have a specific processor in mind? It very much depends on clock frequency

Why the M4 core when there's also the stm32f7 and stm32h7 which can run at higher clock rates?

Have you actually tried to do such measurements yourself?

Embedded micocontrollers do not have that much built-in RAM (I know of some with 1M). Raw bandwidth therefore isn't much use because the RAM will quickly fill up, and if it takes a long time to process the information then that's not so exciting.

What this sounds like is someone fishing for numbers in order to shout "my processor is better than yours". Please, original poster, prove me wrong and say how the numbers will be used.

MasterT
Lead

Check out " AN4031

Application note

Using the STM32F2, STM32F4 and STM32F7 Series

DMA controller "

Singh.Harjit
Senior II

@MasterT​ Thank you for the info. on the app. note. There is similar one for the STM32G474 (AN2548) that had some great info.

Good questions. Let me provide more context.

Background:

I'm driving multiple brushless motors from one microcontroller. For the control algorithm I want to use (field oriented control), for each motor, I need to read two motor winding currents, the motor position and the supply voltage at the same time.

The hardware consists of current sensors connected to the analog inputs on the MCU. Because of the motor inductance, I have to run the PWMs at 100kHz. This means the ADC conversion has to be fast. For the ADC conversion to be fast, we need a small sampling time. For a small sampling time, we need a low impedance source. For the low impedance source, I'm using the microcontroller's built in op-amps in voltage follower mode. At the ADC clock freq. I'm using, I can do the ADC conversion in approximately 300ns.

Since I'm using the built in op-amps and I want to support multiple motors, this means I cannot use the ADC's input multiplexer, instead I have to use the op-amps input multiplexer.

So, whenever I want to read the motor winding currents and motor position, I have to change the op-amp input multiplexer and I have to trigger reading of the motor position.

My approach is to use dual ADCs to simultaneously sample the motor winding currents. When you use dual ADCs, and do a regular conversion, you have the choice of the ADC unit generating one DMA request per ADC channel (so, two DMA requests total) or one DMA request for both channels .

The plan is to use the two DMA request where we have:

1) The first ADC DMA request transfer the ADC results to memory from both ADCs using the common data register.

2) The second ADC DMA request the motor position read. Then have this DMA request chain to the DMA request generator to change the address such that the next DMA trigger will read the other motor position. I will then have this DMA request chain to the DMA request generator to change the op-amp mux to the other motor.

The timing of the system is such that I need to update the DMA target for #2 within the ADC conversion time (300ns).

From reading Application Note (AN2548), we know it takes seven clock cycles to do one DMA transfer. The MCU is running at 170MHz. This means, it will take approx. 42ns to transfer 32bits from memory to the peripheral register. Given the 300ns conversion time and the 42ns to do the DMA based transfer, it means I can "easily" do six 32 bit transfers with a little extra room.

The DMA register layout is such that the control register (which has the enable) is at lower address than the peripheral and memory address registers and the DMA only allows the address to be incremented. So, this means I can:

* Update the peripheral and memory address registers and then use another DMA request generator to update the DMA control register.

OR

* Update the control register and the peripheral and memory address registers (three) sequentially but only if the address registers can be changed after the DMA channel is enabled.

What works in my favor is that I *know* that the trigger will not occur before I can make the change. I can guarantee this because of the timing.

From reading various documents and thinking about/speculating how the DMA controller is implemented, I don't think the peripheral and memory address are cached anywhere. The reason is that the DMA unit takes two clock cycles at a DMA request to do some arbitration and preparing of the transfer and it increments the registers at the end of the DMA transfer so that it can support arbitration. So, this tells me that you can change the addresses as much as you want until the request shows up. BUT once a transfer starts, the address registers may be ignored because they have to keep a "current" address.

Why the STM32G4 MCU?

I did consider the STM32H7 series and decided not to go with them because they have much high power consumption and didn't have the peripherals I wanted. I did benchmark the STM32H7 and @ 480MHz, it is 3x to 4x faster than the STM32G4 @ 170MHz. To some degree, the longer pipeline in the M7 and the dual issue in order counter act each other.

In any case, if you have data/results, on changing the DMA pointers after the DMA is enabled but before a trigger or memory throughput via the CPU and DMA, I'd love to see it.

PS: My math says that if a DMA transfer takes 42ns for 32bits, at 100% DMA utilization, we can get 95MB/s. This would be with no contending CPU accesses. I think with contending CPU accesses and assuming they use the same AHB bus, then, the max. throughput will be cut in half (47MB/s) due to round-robin bus arbitration between the CPU and DMA.

I have the schematics for the design but haven't done the layout. As soon as I do that and make a board, I'll be able to get this data.

I'm very sure ST has his data - we used to for custom silicon we did - and I'm hopeful they or the community can share some results.

> 2) The second ADC DMA request the motor position read. Then have this DMA request chain to the DMA request generator to change the address such that the next DMA trigger will read the other motor position. I will then have this DMA request chain to the DMA request generator to change the op-amp mux to the other motor.

Please elaborate.

What exactly is "motor position read"? (Consider me dumb: I don't control motors and I don't know how to read their position).

How do you want to "DMA request chain to the DMA request generator", in terms of DMA/DMAMUX or other registers' values?

JW

The motor position read is done using a SPI read. The DMA is going to write a 16bit word to the SPI data register which will generate a SPI transfer which will read the data back from the encoder chip.

RE: "DMA request chain...", I was planning to use the DMA channel event (dmamux_evtx) to generate a request (DMAMUX_Req Gx) into another DMA channel. This part has four DMA event/generators.