cancel
Showing results for 
Search instead for 
Did you mean: 

Issue with the execution time of NOP instruction [STM32F746G-DISCO]

Ciavolino.Giuseppe
Associate II
Posted on July 27, 2017 at 02:34

hello to everyone,

before starting a project that involves digital signal processing, I'm doing some tests with the stm32f746g-DISCO

to evaluate the capabilities of the board.

In particular, i've measured (toggling the GPIOI_PIN_1) the execution time of NOP : 60ns.

I'm using the CubeMx and I've properly set up the clock configuration to run at 216Mhz (maximum frequency).

Also, i've enabled in the 'Cortex_M7 Configuration' section: TCM Interface, ART ACCELERATOR, Instruction Prefetch,CPU ICache and CPU DCache.

I'm a little bit upset, because 60ns for a NOP is in contrast with the idea of a system core clock that runs at 216 Mhz.

I'm doing something wrong, I'm sure, but I really don't understand where.I've checked the RCC registers and the content is coherent

with the code generated with the CubeMx.

Is there any possibilities that I'm in error? In  the documents related (datasheet, reference manual, programming manual etc.)

there isn't the information that i'm searching.

With pipelining, one cycle machine should match one cycle for instruction...so why this happen?

Sorry for the bad English, this is the first time that i post on an international forum..

Thanks for the attention.  

 

#cortex-m #arm #stm32f7 #stm32-cube-mx #cycle-machine #nop #execution-time
12 REPLIES 12
Ciavolino.Giuseppe
Associate II
Posted on July 31, 2017 at 17:16

As I said in the previous post, today I could do some tests.

Instead of measuring the NOP, I preferred to measure the execution time of a math operation: in this case the division.

I found the AN4044 ''Floating Point Unit demonstration on STM32 microcontrollers'' where is reported the number of machine cycle associated to each math operation:

0690X00000607e9QAA.png

So I write this code:

0690X00000607lDQAQ.png

I've checked from the debug that the assembly code is coherent with the FPU's instruction:0690X00000607lJQAQ.png

The time for 10 division operation is 1.08us, so dividing this interval by 10, the time for a single division is 108ns.

Dividing 108ns by 20 (14 cycles+6 cycles) I have the time of one cycle that is 5.4ns.

0690X00000602UDQAY.bmp

This is coherent with a system core clock of 216Mhz?

Posted on July 31, 2017 at 18:21

Here's a suggestion, use DWT_CYCCNT to count cycles

Stop using C, and replicate the VDIV.F32 s2,s0,s1 10x or 100x times in assembler. This will show the execution of the instruction, not the pairing or pipeline stalls other sequences might introduce.

Is 14 cycles the maximum? Could certain data foreshorten this?

Multiplication by a reciprocal could get you to 1 cycle for this division.

A compiler paying attention could fold this code.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..
STOne-32
ST Employee
Posted on July 31, 2017 at 22:10

Dears,

NOP in all cortex-M CPUs is not intended from its original design like on legacy ARM7/ARM9 cores to be  used for timing/counting cycles. But for Padding and align data or Code. Instead you can use  {MOV R0, R0} as example. Look to this article  from ARM web site : 

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0552a/CHDJJGFB.html

 

Good lecture,

Cheers

STOne -32