2017-07-26 05:34 PM
hello to everyone,
before starting a project that involves digital signal processing, I'm doing some tests with the stm32f746g-DISCO to evaluate the capabilities of the board.In particular, i've measured (toggling the GPIOI_PIN_1) the execution time of NOP : 60ns.I'm using the CubeMx and I've properly set up the clock configuration to run at 216Mhz (maximum frequency).Also, i've enabled in the 'Cortex_M7 Configuration' section: TCM Interface, ART ACCELERATOR, Instruction Prefetch,CPU ICache and CPU DCache.I'm a little bit upset, because 60ns for a NOP is in contrast with the idea of a system core clock that runs at 216 Mhz. I'm doing something wrong, I'm sure, but I really don't understand where.I've checked the RCC registers and the content is coherent with the code generated with the CubeMx.Is there any possibilities that I'm in error? In the documents related (datasheet, reference manual, programming manual etc.) there isn't the information that i'm searching.With pipelining, one cycle machine should match one cycle for instruction...so why this happen?Sorry for the bad English, this is the first time that i post on an international forum..Thanks for the attention. #cortex-m #arm #stm32f7 #stm32-cube-mx #cycle-machine #nop #execution-time2017-07-31 08:16 AM
As I said in the previous post, today I could do some tests.
Instead of measuring the NOP, I preferred to measure the execution time of a math operation: in this case the division.I found the AN4044 ''Floating Point Unit demonstration on STM32 microcontrollers'' where is reported the number of machine cycle associated to each math operation:So I write this code:
I've checked from the debug that the assembly code is coherent with the FPU's instruction:
The time for 10 division operation is 1.08us, so dividing this interval by 10, the time for a single division is 108ns.
Dividing 108ns by 20 (14 cycles+6 cycles) I have the time of one cycle that is 5.4ns.This is coherent with a system core clock of 216Mhz?
2017-07-31 11:21 AM
Here's a suggestion, use DWT_CYCCNT to count cycles
Stop using C, and replicate the VDIV.F32 s2,s0,s1 10x or 100x times in assembler. This will show the execution of the instruction, not the pairing or pipeline stalls other sequences might introduce.
Is 14 cycles the maximum? Could certain data foreshorten this?
Multiplication by a reciprocal could get you to 1 cycle for this division.
A compiler paying attention could fold this code.
2017-07-31 01:10 PM
Dears,
NOP in all cortex-M CPUs is not intended from its original design like on legacy ARM7/ARM9 cores to be used for timing/counting cycles. But for Padding and align data or Code. Instead you can use {MOV R0, R0} as example. Look to this article from ARM web site :
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0552a/CHDJJGFB.html
Good lecture,
Cheers
STOne -32