cancel
Showing results for 
Search instead for 
Did you mean: 

Single bulk memory copy (instead of memcpy)

Roy Ben Hayun
Associate II
Posted on May 04, 2018 at 15:24

We are using STM32-767ZI and copying a small array into a memory area which is mapped to an external FPGA.

Looking for how to make the copy in a single bulk operation.

The options are:

1. with a loop:

        while(i < destMemoryLength){

            destMemoryAddress[i] = srcMemoryBuffer[i];

            i++;

        }

this is probably using the CPU for each copy. (guessing it's an assembly MOV operations for each).

2. with C memcpy(dest, src, size):

        memcpy(destMemoryAddress, srcMemoryBuffer, sizeof(srcMemoryBuffer));

        //Q: Is memcpy optimized to use a single CPU instruction, or uses a loop internally?

3. Is there another HAL\CMSIS\else command or function that does a bulk copy?

(in other words, what should we do to ensure a bulk copy, and not a byte-by-bye copy?)

Thanks!

#memory-to-memory #stm32-f7 #memcpy
6 REPLIES 6
Posted on May 04, 2018 at 16:45

Going to depend on the compiler. memcpy() is most frequently optimized, both in terms of aligning addresses, but also using LDR/STR multiple instructions, ie fetch 8 words, write 8 words.

>>this is probably using the CPU for each copy. (guessing it's an assembly MOV operations for each).

Shouldn't have to guess, the processor is well documented, and the compiler listing files or disassemblies can be reviewed.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..
Posted on May 04, 2018 at 19:04

You could use memory-to-memory DMA.

In 

https://www.embedded.fm/blog/2017/2/27/dma-examples

 , I was able to get a 27% advantage in data transfer speed using DMA over the iterative code.

BTW, use memmove instead of memcpy. memcpy has problems with overlapping buffers.

Posted on May 04, 2018 at 17:18

Thanks Clive. Not sure if it means that memcpy() does copy in a single bulk operation. Could you please clarify?

I've rephrased a bit the question to: 'what should we do to ensure a bulk copy?'

Thx!
Posted on May 04, 2018 at 17:29

Not sure what construes as a 'bulk copy' in this context, something that is atomic?

There really isn't an uninterruptable form. The copy can occur as words rather than bytes, and the copy can be done as multiple words, limited by the available registers you want to commit to the task.

The buses are somewhat disconnected, the writes occur through write-buffers, and are thus deferred. The processor can out-run the bus bandwidth, at this point the pipeline will stall.

If you don't want to get your hands dirty with the micro-processor level functionality, the memcpy() is likely to be the most optimized or in-lined method.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..
Posted on May 04, 2018 at 17:39

Keil will inline memcpy() on occasion, and the library will also fold bytes into words.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..
Posted on May 05, 2018 at 04:44

The CPU can get faster than the DMA, 35% better than the iterative example if one tries a little.

63 manual CRCs per second. And the CRC for this batch is 74802052

84 manual CRCs per second (X). And the CRC for this batch is 74802052

81 manual CRCs per second (Y). And the CRC for this batch is 74802052

85 manual CRCs per second (Z). And the CRC for this batch is 74802052

63 manual CRCs per second. And the CRC for this batch is 74802052

85 manual CRCs per second (X). And the CRC for this batch is 74802052

81 manual CRCs per second (Y). And the CRC for this batch is 74802052

85 manual CRCs per second (Z). And the CRC for this batch is 74802052

63 manual CRCs per second. And the CRC for this batch is 74802052

84 manual CRCs per second (X). And the CRC for this batch is 74802052

81 manual CRCs per second (Y). And the CRC for this batch is 74802052

85 manual CRCs per second (Z). And the CRC for this batch is 74802052

63 manual CRCs per second. And the CRC for this batch is 74802052

84 manual CRCs per second (X). And the CRC for this batch is 74802052

81 manual CRCs per second (Y). And the CRC for this batch is 74802052

85 manual CRCs per second (Z). And the CRC for this batch is 74802052

63 manual CRCs per second. And the CRC for this batch is 74802052

84 manual CRCs per second (X). And the CRC for this batch is 74802052

81 manual CRCs per second (Y). And the CRC for this batch is 74802052

85 manual CRCs per second (Z). And the CRC for this batch is 74802052

If the CRC hardware was designed differently we could probably double the speed.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..