De-interleaving DMA output from sequentially configured ADCs

Nev · ‎2024-01-18

I am using all 3 ADCs on an F765 to convert 16 channels of data at 50 ksps. Each ADC is configured to perform a sequence of conversions (5 on ADC1 and ADC2, 6 on ADC3), all triggered from the same timer. I have also configured them to ping-pong the data using the DMA. The ping-pong buffers reside in external SDRAM.

This works fine, but, ofcourse, there is the problem of the data being scrambled, so it has to be de-interleved into linear buffers for each channel so that I can then perform signal processing on each channel.

There is a lot of data to de-interleave (5000x2x2x16 bytes), and 100 ms in which to do all the required processing before the next buffer needs to be serviced. The de-interleaving routine takes about 39 ms after optimisation, so I only have 61 ms for the other filtering operations, which is likely to be a problem.

The DMA doesn't seem to allow clever indexing to de-interleave on the fly...am I missing a trick?

Is there something else I should be considering to get around the problem?

TDK · ‎2024-01-18

There's no trick, you have to de-interleave with the cpu. Putting the buffers in onboard RAM as opposed to SDRAM will speed things up considerably.

Consider smaller buffers in internal RAM which then get pushed out (after de-interleaving) to SDRAM.

If you feel a post has answered your question, please click "Accept as Solution".

View solution in original post

TDK · ‎2024-01-18

There's no trick, you have to de-interleave with the cpu. Putting the buffers in onboard RAM as opposed to SDRAM will speed things up considerably.

Consider smaller buffers in internal RAM which then get pushed out (after de-interleaving) to SDRAM.

If you feel a post has answered your question, please click "Accept as Solution".

Nev · ‎2024-01-18

Thanks for your confirmation regarding our understanding of ADC/DMA interaction.

We have considered using the internal tightly-coupled memory...we've done this with other projects, where we've had slightly different requirements. We've chosen a 100 ms ping-pong time to ensure we capture enough data to perform a specific analysis that ideally requires that time frame to get good statistics.

We could probably go down to 50 ms and collect data in a different way for that specific analysis, while still having time to process the data with our IIR and FIR filters.

I will look at implementing that as a solution and see how much time we can save.

waclawek.jan · ‎2024-01-18

The trick is to de-interleave as you read the data for the processing. You cannot use the canned functions of course.

JW

Nev · ‎2024-01-19

@waclawek.jan Since we are not just consuming the data in our signal processing chain, but also "capturing" the data to be able to get what you might call pre-trigger information, then de-interleaving on-the-fly won't save us any time, but thanks for the suggestion.

@TDKA simple change in the linker file to put the ping-pong buffers in RAM has saved 13 ms, so that's a great suggestion.

I think we can consider this thread closed, but if anyone would like to contribute further, then please feel free :)

waclawek.jan · ‎2024-01-19

> we are not just consuming the data in our signal processing chain, but also "capturing" the data to be able to get what you might call pre-trigger information

That sounds like just another opportunity where you read the data anyway; so why not doing it in the order they come from DMA and once they are read, also store them in whatever order the next step requires.

My general point is, that "building bricks" methods often trade off convenience for efficiency. That does not mean, that there necessarily exists a "better" solution for your particular situation, but that there is no universal "best" solution for all situations.

While this is universally true, in "general purpose" programming efficiency (whatever that means) is not anything of particular concern since the last couple of decades. This is not so in microcontrollers, especially if they are micro and they are to control (while the chipmakers want to make us believe just the opposite). However, sadly, Gordon Moore is not with us anymore.

JW

Nev · ‎2024-01-19

@waclawek.janpoint taken...there are many ways to do these things. BTW I avoid using HAL stuff, just use the LL stuff. FYI I cut my teeth on 6502 and Z80 assembler, so like to be close to register level ;)

TDK · ‎2024-01-19

If you really need to eke out every bit of performance, doing the transfer in assembly can speed things up quite a bit, like factor of 4x, depending on how inefficient your C code was.

If you feel a post has answered your question, please click "Accept as Solution".

Nev · ‎2024-01-22

@TDKwe have a copy of the ARM assembly language book...no one here is looking forward to having to go that route, it's probably 20 years since any of us have used assembly language ;).

I think we'd consider an H7 processor if we need more processing power, rather than trying to squeeze the pips on the F7. However, since we already have a two processor architecture (both F765), we may also have the option of offloading some processing :).

Thanks for your input.