cancel
Showing results for 
Search instead for 
Did you mean: 

The speed of the stm32 GPIO (different series).

Alexey Vergin
Associate II

Dear engineers. I want to read an extern parallel ADC directly connected to GPIO ports of stm32 microcontroller and clocked from it. It is necessary to make 1000 readings from the ADC with a maximum speed and write them to stm32 memory. What is the maximum access speed for ports for different stm32 series, which is more profitable, use DMA (CPUs and interrupts will be disabled for the duration of DMA operation) or reading by the assembler commands (all DMA channels and all interrupts will be disabled, without organizing aтн cycle).

16 REPLIES 16

On the 'F4 running out of SRAM mapped at 0x0000'0000 and storing to the CCM RAM and completely unrolled asm, I believe you could achieve 2 or 3 system cycles per read/store. DMA is slower.

F7 may buy you a few MHz on the raw system clock, but I'm not that all sure about how the dual-issue core would dance with the unrolled code. Maybe it'll get to the theoretical 1 cycle, who knows.

But why don't you try yourself, it's a matter of getting 2 or 3 of the cheapest Nucleo boards.

JW

PS. Change your username to a normal nick.

An additional store to a GPIO register is needed to provide an interface clock signal (or how would they be synchronized?), and both the GPIO read and write go through the AHB bus, which runs on half the core clock.

> An additional store to a GPIO register is needed to provide an interface clock signal (or how would they be synchronized?),

I don't think this ever occured as a question, as the post was not posed as such.

> both the GPIO read and write go through the AHB bus, which runs on half the core clock.

On which STM32?

On F4, the bus matrix thus both AHB run at the system clock which is also the processor (core) clock.

JW

> I don't think this ever occured as a question

Sure, but without synchronization, the data would be useless. It would be quite tricky to persuade the ADC to toggle its outputs just in time for the reads.

I'm thinking about using FMC or QSPI for the interface. But the question is a bit vague without part numbers.

> On F4, the bus matrix thus both AHB run at the system clock

Right, I was looking at the H7 at the moment.

> DMA is slower.

Are DMA timings documented somewhere, or is it just experience?

> It would be quite tricky to persuade the ADC to toggle its outputs just in time for the reads.

Not necessarily, the STM32 clock and ADC clock comes from the same source.

> Are DMA timings documented somewhere, or is it just experience?

AN2548 for the single-port DMA, AN4031 for the dual-port.

JW

Alexey Vergin
Associate II

Thanks for answers. For ADC clock may be use the clk out pin. Or the timer output.

I still can't imagine how a repeated sequence of ldr,str instructions would be reliably synchronized to the start and phase of a clock signal, with all the caching, prefetching, pipelining involved, even if the clock is generated somewhere in the MCU. If the receiver happens to look at the bus at the same time the transmitter is putting data on it, there'd be no guaranteed setup time, and the receiver might end up with the previous value, the current one, or parts of both. But I was just curious, so never mind.

Anyway, thank you for the references.

No, it is important. And I am not sure myself that the idea will work. But there is a hope that on the smallest models (F0-F4), if you turn off everything that can interfere, like interrupts or dma, it may success. It is important to understand which model use for which ADC.

It's not *that* complex on the 'F4 - there is no cache (except a jumpcache aka ART on FLASH, but that's unimportant for linear code (ART contains also a small data cache for data reads from FLASH, but that's again nonimportant for this case)), so once you fill up the pipeline you run linearly. There's some arbitration in the busmatrix, but if there are no other busmasters involved, it should be regular. GPIOs are on AHB, so there's no AHB/APB bridge involved (which would mean clock domain crossing). Storing to CCM RAM avoids using the busmatrix altogether for half of the process.

I'd expect the "load-store unit" to be 3-cycle long (maybe if it would turn out to be 2-cycle, I'd add an artificial NOP). I'd start with having the stream synchronized in some way to the ADC outputting a known pattern on the digital side, maybe one bitline delayed slightly (cca half a cycle or slightly less), and some form of "tunable delay" on the clock/sync. Would tune it to "worst case", and then 1.5 cycle away is the best case.

I know where the devil is, and I know this is a genuine case for a FPGA - but the also I know very well that FPGAs are usually a PITA. Back then, we've built a cheap LA ticking up to 66MHz out of a '51, a fast SRAM, and an XC9536.

JW