2021-11-08 12:07 AM
Hello again,
I recently checked the time the STM32F7 (192MHz) needs to copy one 32bit buffer to another one, and I'm a little shocked about how long that takes.
(Maybe I expect too much, coming from 8bit µCs with 20MHz, an STM32 still feels like a racing car compared to a horse carriage)
For time measurement the interrupts are disabled and the PTP registers are used.
It takes about 38µs to copy a uint32_t buffer[256] to another one, that is about 28 cycles @192MHz. Okay, there's the for-loop comparison and increment, but still, 28 cycles?
Anything I oversee or is that just it?
I know it can be a long way from C to assembler, and I haven't checked that yet (never even looked at the STM32 / ARM assembler stuff), but the list excerpt below shows some 17 CPU actions for the for-loop - right?
Here's the uart/debug output:
*.c:
#define RPM_DMA_BUF_MAX 256
uint32_t u32RpmDmaBuf[2][RPM_DMA_BUF_MAX] = { { 0 } };
uint32_t u32RpmUartBuf[RPM_DMA_BUF_MAX];
uint32_t *pu32SrcBuf;
uint32_t u32StopNanoSec = 0;
uint32_t u32StartNanoSec = 0;
/* copy buffer with snapshot of timestamp before and after */
__disable_irq();
u32Val = ETH->PTPTSLR;
for( i16 = 0; i16 < RPM_DMA_BUF_MAX; i16++ )
{
u32RpmUartBuf[i16] = *(pu32SrcBuf++);
}
u32Val2 = ETH->PTPTSLR;
__enable_irq();
u32StartNanoSec = ETH_PTPSubSecond2NanoSecond(u32Val);
u32StopNanoSec = ETH_PTPSubSecond2NanoSecond(u32Val2);
u32RpmDebugTime = 0;
if( u32StopNanoSec > u32StartNanoSec ) u32RpmDebugTime = u32StopNanoSec - u32StartNanoSec;
uart_printf("u32StopNanoSec = %ld\n\r", u32StopNanoSec);
uart_printf("u32StartNanoSec = %ld\n\r", u32StartNanoSec);
uart_printf("u32buffer[%d] copy time: %ld ns\n\r", RPM_DMA_BUF_MAX, u32RpmDebugTime);
*.list
__disable_irq();
u32Val = ETH->PTPTSLR;
8006b42: 4bad ldr r3, [pc, #692] ; (8006df8 <UART3_RxCmndProcessing+0x17d8>)
8006b44: f8d3 370c ldr.w r3, [r3, #1804] ; 0x70c
8006b48: f8c7 3498 str.w r3, [r7, #1176] ; 0x498
for( i16 = 0; i16 < RPM_DMA_BUF_MAX; i16++ )
8006b4c: 2300 movs r3, #0
8006b4e: f8a7 3504 strh.w r3, [r7, #1284] ; 0x504
8006b52: e010 b.n 8006b76 <UART3_RxCmndProcessing+0x1556>
{
//u32RpmUartBuf[i16] = u32RpmDmaBuf[u8RpmBufPtr][i16];
//u32RpmUartBuf[i16] = *(pu32SrcBuf++);
*(pu32DstBuf++) = *(pu32SrcBuf++);
8006b54: f8d7 24ec ldr.w r2, [r7, #1260] ; 0x4ec
8006b58: 1d13 adds r3, r2, #4
8006b5a: f8c7 34ec str.w r3, [r7, #1260] ; 0x4ec
8006b5e: f8d7 34e8 ldr.w r3, [r7, #1256] ; 0x4e8
8006b62: 1d19 adds r1, r3, #4
8006b64: f8c7 14e8 str.w r1, [r7, #1256] ; 0x4e8
8006b68: 6812 ldr r2, [r2, #0]
8006b6a: 601a str r2, [r3, #0]
for( i16 = 0; i16 < RPM_DMA_BUF_MAX; i16++ )
8006b6c: f8b7 3504 ldrh.w r3, [r7, #1284] ; 0x504
8006b70: 3301 adds r3, #1
8006b72: f8a7 3504 strh.w r3, [r7, #1284] ; 0x504
8006b76: f8b7 3504 ldrh.w r3, [r7, #1284] ; 0x504
8006b7a: 2bff cmp r3, #255 ; 0xff
8006b7c: d9ea bls.n 8006b54 <UART3_RxCmndProcessing+0x1534>
}
u32Val2 = ETH->PTPTSLR;
8006b7e: 4b9e ldr r3, [pc, #632] ; (8006df8 <UART3_RxCmndProcessing+0x17d8>)
8006b80: f8d3 370c ldr.w r3, [r3, #1804] ; 0x70c
8006b84: f8c7 3494 str.w r3, [r7, #1172] ; 0x494
__ASM volatile ("cpsie i" : : : "memory");
8006b88: b662 cpsie i
}
8006b8a: bf00 nop
__enable_irq();
Solved! Go to Solution.
2021-11-18 04:37 AM
Thanks KnarfB!
Interesting effects, at first the "asm volatile..." lines didn't do anything, now after some more cleans and builds timing is back where it should be.
So I am not so sure if that was "compiler luck" or the lines you mentioned.
Anyway, thanks for the help - and the link about volatile.
Coming from the hardware side, "volatile" is something I somehow always tried to avoid and thus never really understood, just remembering that some volatile declaration troubled me a long time ago when working with some 8-bit controllers / compilers...