2016-02-10 07:47 AM
hi, I'm using the STM32f407-discovery board example of DMA_FLASH_RAM.
I've noticed by using systics for time measurements, that memcpy provides better timing than DMA.for example.I have 16bit array of 64 elements to send.by memcpy the transer lasted 293systics by DMA 973systics i only activate the DMA and wait for the status to change.Thanks for assistance !!2016-02-10 08:00 AM
DMA doesn't have a cached view of memory, or write buffers. The set up time for DMA is also not insignificant.
2016-02-10 09:17 AM
Your results depend on several factors. What compiler are you using? And what about the libc release?
For example, the newlib library provides a speed optimized version of memcpy(), which automatically detects word-aligned memory transfers. The newlib-nano memcpy(), being optimized for size, it doesn't perform this type of check. Moreover, you should also try DMA m2m transfers word-aligned: for some STM32 MCU you can achieve more than 4x speed-up.These are some test results I've obtained on a wide range of STM32 microcontrollers:2016-02-10 04:13 PM
2016-02-10 10:11 PM
There is just one other thing to take in account: linking the newlib, which provides the performance optimized version of memcpy(), costs about 10K of additional FLASH memory. For smaller STM32 MCUs (e.g. low-cost F0 with less than 32k of FLASH) doing it with DMA is preferable. Moreover, the DMA setup cost is not unimportant as stated by clive, so a DMA M2M transfer makes sense only if you have at least 30-50 elements to copy.
2016-02-10 11:00 PM
My code probably spends about 2% of execution time in memcpy, so I have the following naive implementation:
void
MemCpy(
void
*dst,
const
void
*src, u32 cnt)
{
// Copy longwords, taking advantage of STM ability to read/write unaligned data
while
(cnt >= 4)
{
*(u32 *)dst = *(
const
u32 *)src;
dst = (u8 *)dst + 4;
src = (
const
u8 *)src + 4;
cnt -= 4;
}
// Copy the couple of leftover bytes
while
(cnt--)
{
*(u8 *)dst = *(
const
u8 *)src;
dst = (u8 *)dst + 1;
src = (
const
u8 *)src + 1;
}
}
when compiled with -O2 under GCC, it requires 46 bytes of flash.