cancel
Showing results for 
Search instead for 
Did you mean: 

STM32F7 @ 192MHz: long data copy time?

LCE
Principal II

Hello again,

I recently checked the time the STM32F7 (192MHz) needs to copy one 32bit buffer to another one, and I'm a little shocked about how long that takes.

(Maybe I expect too much, coming from 8bit µCs with 20MHz, an STM32 still feels like a racing car compared to a horse carriage)

For time measurement the interrupts are disabled and the PTP registers are used.

It takes about 38µs to copy a uint32_t buffer[256] to another one, that is about 28 cycles @192MHz. Okay, there's the for-loop comparison and increment, but still, 28 cycles?

Anything I oversee or is that just it?

I know it can be a long way from C to assembler, and I haven't checked that yet (never even looked at the STM32 / ARM assembler stuff), but the list excerpt below shows some 17 CPU actions for the for-loop - right?

Here's the uart/debug output:

*.c:

#define RPM_DMA_BUF_MAX		256
 
uint32_t u32RpmDmaBuf[2][RPM_DMA_BUF_MAX] = { { 0 } };
 
uint32_t u32RpmUartBuf[RPM_DMA_BUF_MAX];
uint32_t *pu32SrcBuf;
 
uint32_t u32StopNanoSec = 0;
uint32_t u32StartNanoSec = 0;
 
/* copy buffer with snapshot of timestamp before and after */ 
__disable_irq();
u32Val = ETH->PTPTSLR;
 
for( i16 = 0; i16 < RPM_DMA_BUF_MAX; i16++ )
{
    u32RpmUartBuf[i16] = *(pu32SrcBuf++);
}
u32Val2 = ETH->PTPTSLR;
__enable_irq();
 
u32StartNanoSec = ETH_PTPSubSecond2NanoSecond(u32Val);
u32StopNanoSec = ETH_PTPSubSecond2NanoSecond(u32Val2);
 
u32RpmDebugTime = 0;
if( u32StopNanoSec > u32StartNanoSec ) u32RpmDebugTime = u32StopNanoSec - u32StartNanoSec;
uart_printf("u32StopNanoSec  = %ld\n\r", u32StopNanoSec);
uart_printf("u32StartNanoSec = %ld\n\r", u32StartNanoSec);
uart_printf("u32buffer[%d] copy time: %ld ns\n\r", RPM_DMA_BUF_MAX, u32RpmDebugTime);

*.list

__disable_irq();
u32Val = ETH->PTPTSLR;
 8006b42:	4bad      	ldr	r3, [pc, #692]	; (8006df8 <UART3_RxCmndProcessing+0x17d8>)
 8006b44:	f8d3 370c 	ldr.w	r3, [r3, #1804]	; 0x70c
 8006b48:	f8c7 3498 	str.w	r3, [r7, #1176]	; 0x498
						for( i16 = 0; i16 < RPM_DMA_BUF_MAX; i16++ )
 8006b4c:	2300      	movs	r3, #0
 8006b4e:	f8a7 3504 	strh.w	r3, [r7, #1284]	; 0x504
 8006b52:	e010      	b.n	8006b76 <UART3_RxCmndProcessing+0x1556>
						{
							//u32RpmUartBuf[i16] = u32RpmDmaBuf[u8RpmBufPtr][i16];
							//u32RpmUartBuf[i16] = *(pu32SrcBuf++);
							*(pu32DstBuf++) = *(pu32SrcBuf++);
 8006b54:	f8d7 24ec 	ldr.w	r2, [r7, #1260]	; 0x4ec
 8006b58:	1d13      	adds	r3, r2, #4
 8006b5a:	f8c7 34ec 	str.w	r3, [r7, #1260]	; 0x4ec
 8006b5e:	f8d7 34e8 	ldr.w	r3, [r7, #1256]	; 0x4e8
 8006b62:	1d19      	adds	r1, r3, #4
 8006b64:	f8c7 14e8 	str.w	r1, [r7, #1256]	; 0x4e8
 8006b68:	6812      	ldr	r2, [r2, #0]
 8006b6a:	601a      	str	r2, [r3, #0]
						for( i16 = 0; i16 < RPM_DMA_BUF_MAX; i16++ )
 8006b6c:	f8b7 3504 	ldrh.w	r3, [r7, #1284]	; 0x504
 8006b70:	3301      	adds	r3, #1
 8006b72:	f8a7 3504 	strh.w	r3, [r7, #1284]	; 0x504
 8006b76:	f8b7 3504 	ldrh.w	r3, [r7, #1284]	; 0x504
 8006b7a:	2bff      	cmp	r3, #255	; 0xff
 8006b7c:	d9ea      	bls.n	8006b54 <UART3_RxCmndProcessing+0x1534>
						}
u32Val2 = ETH->PTPTSLR;
 8006b7e:	4b9e      	ldr	r3, [pc, #632]	; (8006df8 <UART3_RxCmndProcessing+0x17d8>)
 8006b80:	f8d3 370c 	ldr.w	r3, [r3, #1804]	; 0x70c
 8006b84:	f8c7 3494 	str.w	r3, [r7, #1172]	; 0x494
  __ASM volatile ("cpsie i" : : : "memory");
 8006b88:	b662      	cpsie	i
}
 8006b8a:	bf00      	nop
__enable_irq();

This discussion is locked. Please start a new topic to ask your question.
10 REPLIES 10
LCE
Principal II

Thanks KnarfB!

Interesting effects, at first the "asm volatile..." lines didn't do anything, now after some more cleans and builds timing is back where it should be.

So I am not so sure if that was "compiler luck" or the lines you mentioned.

Anyway, thanks for the help - and the link about volatile.

Coming from the hardware side, "volatile" is something I somehow always tried to avoid and thus never really understood, just remembering that some volatile declaration troubled me a long time ago when working with some 8-bit controllers / compilers...