2018-05-04 06:24 AM
We are using STM32-767ZI and copying a small array into a memory area which is mapped to an external FPGA.
Looking for how to make the copy in a single bulk operation.
The options are:
1. with a loop:
while(i < destMemoryLength){
destMemoryAddress[i] = srcMemoryBuffer[i]; i++; }this is probably using the CPU for each copy. (guessing it's an assembly MOV operations for each).
2. with C memcpy(dest, src, size):
memcpy(destMemoryAddress, srcMemoryBuffer, sizeof(srcMemoryBuffer));
//Q: Is memcpy optimized to use a single CPU instruction, or uses a loop internally?
3. Is there another HAL\CMSIS\else command or function that does a bulk copy?
(in other words, what should we do to ensure a bulk copy, and not a byte-by-bye copy?)
Thanks!
#memory-to-memory #stm32-f7 #memcpy2018-05-04 07:45 AM
Going to depend on the compiler. memcpy() is most frequently optimized, both in terms of aligning addresses, but also using LDR/STR multiple instructions, ie fetch 8 words, write 8 words.
>>this is probably using the CPU for each copy. (guessing it's an assembly MOV operations for each).
Shouldn't have to guess, the processor is well documented, and the compiler listing files or disassemblies can be reviewed.
2018-05-04 10:04 AM
You could use memory-to-memory DMA.
In
https://www.embedded.fm/blog/2017/2/27/dma-examples
, I was able to get a 27% advantage in data transfer speed using DMA over the iterative code.BTW, use memmove instead of memcpy. memcpy has problems with overlapping buffers.
2018-05-04 10:18 AM
Thanks Clive. Not sure if it means that memcpy() does copy in a single bulk operation. Could you please clarify?
I've rephrased a bit the question to: 'what should we do to ensure a bulk copy?'
Thx!2018-05-04 10:29 AM
Not sure what construes as a 'bulk copy' in this context, something that is atomic?
There really isn't an uninterruptable form. The copy can occur as words rather than bytes, and the copy can be done as multiple words, limited by the available registers you want to commit to the task.
The buses are somewhat disconnected, the writes occur through write-buffers, and are thus deferred. The processor can out-run the bus bandwidth, at this point the pipeline will stall.
If you don't want to get your hands dirty with the micro-processor level functionality, the memcpy() is likely to be the most optimized or in-lined method.
2018-05-04 10:39 AM
Keil will inline memcpy() on occasion, and the library will also fold bytes into words.
2018-05-04 09:44 PM
The CPU can get faster than the DMA, 35% better than the iterative example if one tries a little.
63 manual CRCs per second. And the CRC for this batch is 74802052
84 manual CRCs per second (X). And the CRC for this batch is 7480205281 manual CRCs per second (Y). And the CRC for this batch is 7480205285 manual CRCs per second (Z). And the CRC for this batch is 7480205263 manual CRCs per second. And the CRC for this batch is 7480205285 manual CRCs per second (X). And the CRC for this batch is 7480205281 manual CRCs per second (Y). And the CRC for this batch is 7480205285 manual CRCs per second (Z). And the CRC for this batch is 7480205263 manual CRCs per second. And the CRC for this batch is 7480205284 manual CRCs per second (X). And the CRC for this batch is 7480205281 manual CRCs per second (Y). And the CRC for this batch is 7480205285 manual CRCs per second (Z). And the CRC for this batch is 7480205263 manual CRCs per second. And the CRC for this batch is 7480205284 manual CRCs per second (X). And the CRC for this batch is 7480205281 manual CRCs per second (Y). And the CRC for this batch is 7480205285 manual CRCs per second (Z). And the CRC for this batch is 7480205263 manual CRCs per second. And the CRC for this batch is 7480205284 manual CRCs per second (X). And the CRC for this batch is 7480205281 manual CRCs per second (Y). And the CRC for this batch is 7480205285 manual CRCs per second (Z). And the CRC for this batch is 74802052If the CRC hardware was designed differently we could probably double the speed.